I want to delete items in vector by indices in cpp code. However it is too slow
cpp version
long remove_cnt = 0;
for (auto &remove_idx : remove_index) {
mylist.erase(mylist.begin() + (long) remove_idx - remove_cnt);
remove_cnt++;
}
python version
new_mylist = [item for i, item in enumerate(mylist) if i not in remove_index]
I expect that cpp is faster than python. But my cpp code is too slower than python code. Are there other efficient code in cpp??
Your question is a good example of why a 1-1 translation between languages usually doesn't work.
To efficiently erase items from a vector you don't do it by index.
Assuming you got your indices in python by evaluating some condition (a predicate). You can directly use this predicate in C++.
Say you want to remove all ints > 4 then the code looks like this:
#include <algorithm>
#include <iostream>
#include <vector>
bool greater_then_4(const int value)
{
return value > 4;
}
int main()
{
std::vector<int> values{ 1, 2, 3, 4, 5, 6, 7, 8 };
// https://en.cppreference.com/w/cpp/algorithm/remove
// remove all values greater then 4. remove will actually move all those values till the end
auto it = std::remove_if(values.begin(), values.end(), greater_then_4);
// shrink the vector to only include the items not matching the predicate
values.erase(it, values.end());
for (const auto value : values)
{
std::cout << value << " ";
}
return 0;
}
Related
Is there a faster way to the following in c++, so that i can outperform the python's implementation?
Get intersection of two map/unordered_map keys
For these intersected keys, compute the pairwise difference between elements of their respective set/unordered_set
Some info that might be useful:
hash_DICT1 has about O(10000)keys, and about O(10) elements in the each set.
hash_DICT2 has about O(1000)keys, and about O(1) elements in the each set.
For example:
map <int,set<int>> hash_DICT1;
hash_DICT1[1] = {1,2,3};
hash_DICT1[2] = {4,5,6};
map <int,set<int>> hash_DICT2;
hash_DICT2[1] = {11,12,13};
hash_DICT2[3] = {4,5,6};
vector<int> output_vector
= GetPairDiff(hash_DICT1, hash_DICT2)
= [11-1,12-1,13-1,
11-2,12-2,13-2,
11-3,12-3,13-3] // only hashkey=1 is intersect, so only compute pairwise difference of the respective set elements.
= [10, 11, 12,
9, 10, 11,
8, 9, 10] // Note that i do want to keep duplicates, if any. Order does not matter.
GetPairDiff function.
vector<int> GetPairDiff(
unordered_map <int, set<int>> &hash_DICT1,
unordered_map <int, set<int>> &hash_DICT2) {
// Init
vector<int> output_vector;
int curr_key;
set<int> curr_set1, curr_set2;
// Get intersection
for (const auto &KEY_SET:hash_DICT2) {
curr_key = KEY_SET.first;
// Find pairwise difference
if (hash_DICT1.count(curr_key) > 0){
curr_set1 = hash_DICT1[curr_key];
curr_set2 = hash_DICT2[curr_key];
for (auto it1=curr_set1.begin(); it1 != curr_set1.end(); ++it1) {
for (auto it2=curr_set2.begin(); it2 != curr_set2.end(); ++it2) {
output_vector.push_back(*it2 - *it1);
}
}
}
}
}
main run
int main (int argc, char ** argv) {
// Using unordered_map
unordered_map <int,set<int>> hash_DICT_1;
hash_DICT_1[1] = {1,2,3};
hash_DICT_1[2] = {4,5,6};
unordered <int,set<int>> hash_DICT_2;
hash_DICT_2[1] = {11,12,13};
hash_DICT_2[3] = {4,5,6};
GetPairDiff(hash_DICT_1, hash_DICT_1);
}
Compiled like this
g++ -o ./CompareRunTime.out -Ofast -Wall -Wextra -std=c++11
Other data structures are welcomed, such as map or unordered_set.
However i did try all 4 permutations, and found the one given by GetPairDiff runs the fastest, but nowhere near as fast as the python's implementation:
hash_DICT1 = { 1 : {1,2,3}, 2 : {4,5,6} }
hash_DICT2 = { 1 : {11,12,13}, 3 : {4,5,6} }
def GetPairDiff(hash_DICT1, hash_DICT2):
vector = []
for element in hash_DICT1.keys() & hash_DICT2.keys():
vector.extend(
[db_t-qry_t
for qry_t in hash_DICT2[element]
for db_t in hash_DICT1[element] ])
return vector
output_vector = GetPairDiff(hash_DICT1, hash_DICT2)
Performance comparison:
python : 0.00824 s
c++ : 0.04286 s
The implementation by c++ takes about 5 times the time taken !!!
You do a lot of copying where you should be using const&.
You don't save search results. You should use find instead of count and then use the result.
push_back to a vector may be made faster by reserve()ing the number of elements you need to store if you know the number in advance.
Fixing these issues could result in something like this (requires C++17):
#include <iostream>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
using container = std::unordered_map<int, std::unordered_set<int>>;
std::vector<int> GetPairDiff(const container& hash_DICT1,
const container& hash_DICT2) {
// Init
std::vector<int> output_vector;
// Get intersection
for(auto& [curr_key2, curr_set2] : hash_DICT2) {
// use find() instead of count()
if(auto it1 = hash_DICT1.find(curr_key2); it1 != hash_DICT1.end()) {
auto& curr_set1 = it1->second;
// Reserve the space you know you'll need for this iteration. Note:
// This might be a pessimizing optimization so try with and without it.
output_vector.reserve(curr_set1.size() * curr_set2.size() +
output_vector.size());
// Calculate pairwise difference
for(auto& s1v : curr_set1) {
for(auto& s2v : curr_set2) {
output_vector.emplace_back(s2v - s1v);
}
}
}
}
return output_vector;
}
int main() {
container hash_DICT1{{1, {1, 2, 3}},
{2, {4, 5, 6}}};
container hash_DICT2{{1, {11, 12, 13}},
{3, {4, 5, 6}}};
auto result = GetPairDiff(hash_DICT1, hash_DICT2);
for(int v : result) {
std::cout << v << '\n';
}
}
This is more than 8 times as fast as the python version for these containers on my computer compiled with g++ -std=c++17 -O3.
Here's a C++11 version of the same program:
#include <iostream>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>
using container = std::unordered_map<int, std::unordered_set<int>>;
std::vector<int> GetPairDiff(const container& hash_DICT1,
const container& hash_DICT2) {
// Init
std::vector<int> output_vector;
// Get intersection
for(auto& curr_pair2 : hash_DICT2) {
auto& curr_key2 = curr_pair2.first;
auto& curr_set2 = curr_pair2.second;
// use find() instead of count()
auto it1 = hash_DICT1.find(curr_key2);
if(it1 != hash_DICT1.end()) {
auto& curr_set1 = it1->second;
// Reserve the space you know you'll need for this iteration. Note:
// This might be a pessimizing optimization so try with and without it.
output_vector.reserve(curr_set1.size() * curr_set2.size() +
output_vector.size());
// Calculate pairwise difference
for(auto& s1v : curr_set1) {
for(auto& s2v : curr_set2) {
output_vector.emplace_back(s2v - s1v);
}
}
}
}
return output_vector;
}
int main() {
container hash_DICT1{{1, {1, 2, 3}},
{2, {4, 5, 6}}};
container hash_DICT2{{1, {11, 12, 13}},
{3, {4, 5, 6}}};
auto result = GetPairDiff(hash_DICT1, hash_DICT2);
for(int v : result) {
std::cout << v << '\n';
}
}
I am doing benchmarking for finding nearest neighbour for the datapoints. My c++ implementation and python implementation are taking almost same execution time. Shouldn't be c++ works better than the raw python implementation.
C++ Execution Time : 8.506 seconds
Python Execution Time : 8.7202 seconds
C++ Code:
#include <iostream>
#include <random>
#include <map>
#include <cmath>
#include <numeric>
#include <algorithm>
#include <chrono>
#include <vector> // std::iota
using namespace std;
using namespace std::chrono;
double edist(double* arr1, double* arr2, uint n) {
double sum = 0.0;
for (int i=0; i<n; i++) {
sum += pow(arr1[i] - arr2[i], 2);
}
return sqrt(sum); }
template <typename T> vector<size_t> argsort(const vector<T> &v) {
// initialize original index locations
vector<size_t> idx(v.size()); iota(idx.begin(), idx.end(), 0);
// sort indexes based on comparing values in v
sort(idx.begin(), idx.end(),
[&v](size_t i1, size_t i2) {return v[i1] < v[i2];});
return std::vector<size_t>(idx.begin() + 1, idx.end()); }
int main() {
uint N, M;
// cin >> N >> M;
N = 1000;
M = 800;
double **arr = new double*[N];
std::random_device rd; // obtain a random number from hardware
std::mt19937 eng(rd()); // seed the generator
std::uniform_real_distribution<> distr(10.0, 60.0);
for (int i = 0; i < N; i++) {
arr[i] = new double[M];
for(int j=0; j < M; j++) {
arr[i][j] = distr(eng);
}
}
auto start = high_resolution_clock::now();
map<int, vector<size_t> > dist;
for (int i=0; i<N; i++) {
vector<double> distances;
for(int j=0; j<N; j++) {
distances.push_back(edist(arr[i], arr[j], N));
}
dist[i] = argsort(distances);
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop-start);
int dur = duration.count();
cout<<"Time taken by code: "<<dur<<" microseconds"<<endl;
cout<<" In seconds: "<<dur/pow(10,6);
return 0; }
Python Code:
import time
import numpy as np
def comp_inner_raw(i, x):
res = np.zeros(x.shape[0], dtype=np.float64)
for j in range(x.shape[0]):
res[j] = np.sqrt(np.sum((i-x[j])**2))
return res
def nearest_ngbr_raw(x): # x = [[1,2,3],[4,5,6],[7,8,9]]
#print("My array: ",x)
dist = {}
for idx,i in enumerate(x):
#lst = []
lst = comp_inner_raw(i,x)
s = np.argsort(lst)#[1:]
sorted_array = np.array(x)[s][1:]
dist[idx] = s[1:]
return dist
arr = np.random.rand(1000, 800)
start = time.time()
table = nearest_ngbr_raw(arr)
print("Time taken to execute the code using raw python is {}".format(time.time()-start))
Compile Command:
g++ -std=c++11 knn.cpp -o knn
C++ compiler(g++) version for ubuntu 18.04.1: 7.4.0
Coded in c++11
Numpy version : 1.16.2
Edit
Tried with compiler optimization, now it is taking around 1 second.
Can this c++ code be optimized further from coding or any other perspective?
Can this c++ code be optimized further from coding or any other perspective?
I can see at least three optimisations. The first two are easy and should definitely be done but in my testing they end up not impacting the runtime measurably. The third one requires rethinking the code minimally.
edist caculates a costly square root, but you are only using the distance for pairwise comparison. Since the square root function is monotonically increasing, it has no impact on the comparison result. Similarly, pow(x, 2) can be replaced with x * x and this is sometimes faster:
double edist(std::vector<double> const& arr1, std::vector<double> const& arr2, uint n) {
double sum = 0.0;
for (unsigned int i = 0; i < n; i++) {
auto const diff = arr1[i] - arr2[i];
sum += diff * diff;
}
return sum;
}
argsort performs a copy because it returns the indices excluding the first element. If you instead include the first element (change the return statement to return idx;), you avoid a potentially costly copy.
Your matrix is represented as a nested array (and you’re for some reason using raw pointers instead of a nested std::vector). It’s generally more efficient to represent matrices as contiguous N*M arrays: std::vector<double> arr(N * M);. This is also how numpy represents matrices internally. This requires changing the code to calculate the indices.
Is there a way to shift array elements in C++ without using any loop like the below Python code which shifts the elements of the list just by manipulating list indices
def rotate(lst, n):
n = n % len(lst)
return lst[n:] + lst[:n]
> rotate([1,2,3,4,5], 1) # rotate forward
[2, 3, 4, 5, 1]
C++ standard algorithms also work with arrays, so you can just use std::rotate or std::rotate_copy.
The functions' interfaces are a bit more complex than rotation in your Python example, though. You have to provide, as a second argument, an iterator to the element which will become the first element in the resulting array.
For an array { 1, 2, 3, 4, 5 } and a forward rotation by one element, that would be the second element (the "2"). You get an iterator to that element by adding 1 to an iterator to the array's first element, for example array.begin() + 1, assuming that you use std::array, or array + 1 if it's a raw array.
#include <iostream>
#include <algorithm>
#include <array>
int main()
{
std::array<int, 5> array = { 1, 2, 3, 4, 5 };
std::rotate(
array.begin(),
array.begin() + 1,
array.end()
);
for (auto&& element : array)
{
std::cout << element << "\n";
}
}
If you want an interface like in your Python code, then you can wrap std::rotate in a function of your own and provide an int parameter. This is also a nice opportunity to make the whole thing more reusable by creating a generic function which can be used with any suitable container:
#include <iostream>
#include <algorithm>
#include <array>
#include <vector>
#include <list>
template <class Container>
void rotate(Container& container, int n)
{
using std::begin;
using std::end;
auto new_begin = begin(container);
std::advance(new_begin, n);
std::rotate(
begin(container),
new_begin,
end(container)
);
}
int main()
{
std::array<int, 5> array = { 1, 2, 3, 4, 5 };
rotate(array, 1);
std::vector<int> vector = { 1, 2, 3, 4, 5 };
rotate(vector, 3);
std::list<int> list = { 1, 2, 3, 4, 5 };
rotate(list, 2);
int raw_array[] = { 1, 2, 3, 4, 5 };
rotate(raw_array, 3);
// test output goes here...
}
Note how std::begin and std::end make sure that raw arrays (with their begin + N syntax) and container classes (with their c.begin() + N syntax) are both supported, and std::advance makes the function work for containers with non-random-access iterators like std::list (where you must increment iterators repeatedly to advance them by more than one element).
By the way, if you want to support n arguments greater than or equal to the container's size, then you can use the C++17 function std::size or just create your own. And perhaps use assert to catch accidental negative arguments:
assert(n >= 0);
using std::size;
n = n % size(container);
If there is no equivalent function; is it possible to cleanly generate a QList<int> of (1, 2, 3, 4, 5 ... ) with one line of code, avoiding a for loop or having to write my own function?
I don't know the particularities of Qt containers, but in the STL you could do something like:
std::vector<int> v(n);
std::iota(v.begin(), v.end(), 1);
Or, if not using C++11, std::generate_n(v.begin(), v.end(), my_iota(1)); where my_iota is a functor written by you that simply returns n++, with the initial value of n provided in the ctor.
If Qt containers provide iterators that comply with the STL OutputIterator concept you should be OK using std::generate or std::iota.
Qt containers (QList and QVector) provide STL compatible iterators that can utilize this functionality:
#include <QDebug>
#include <QVector>
#include <numeric>
inline QVector<int> range(int start, int end)
{
QVector<int> l(end-start+1);
std::iota(l.begin(), l.end(), start);
return l;
}
int main()
{
qDebug() << range(-3, 4);
return 0;
}
prints
QVector(-3, -2, -1, 0, 1, 2, 3, 4)
I have written a good bit of code in python and it works great. But now I'm scaling up the size of the problems that I'm analyzing and python is dreadfully slow. The slow part of the python code is
for i in range(0,H,1):
x1 = i - length
x2 = i + length
for j in range(0,W,1):
#print i, ',', j # check the limits
y1 = j - length
y2 = j + length
IntRed[i,j] = np.mean(RawRed[x1:x2,y1:y2])
With H and W equal to 1024 the function takes around 5 minutes to excute. I've written a simple c++ program/function that performs the same computation and it excutes in less than a second with the same data size.
double summ = 0;
double total_num = 0;
double tmp_num = 0 ;
int avesize = 2;
for( i = 0+avesize; i <X-avesize ;i++)
for(j = 0+avesize;j<Y-avesize;j++)
{
// loop through sub region of the matrix
// if the value is not zero add it to the sum
// and increment the counter.
for( int ii = -2; ii < 2; ii ++)
{
int iii = i + ii;
for( int jj = -2; jj < 2 ; jj ++ )
{
int jjj = j + jj;
tmp_num = gsl_matrix_get(m,iii,jjj);
if(tmp_num != 0 )
{
summ = summ + tmp_num;
total_num++;
}
}
}
gsl_matrix_set(Matrix_mean,i,j,summ/total_num);
summ = 0;
total_num = 0;
}
I have some other methods to perform on the 2D array. The one listed is a simple examples.
What I want to do is pass a python 2D array to my c++ function and return a 2D array back to python.
I've read a bit about swig, and have sereached pervious questions, and it seems like it's a possible solution. But I can't seem to figure out what I actually need to do.
Can I get any help? Thanks
You can use arrays as it is described here: Doc - 5.4.5 Arrays, the carray.i or std_vector.i from the SWIG library.
I find it easier to work with std::vector from the SWIG library std_vector.i to send a python list to a C++ SWIG extension. Though in your case where optimization matters, it may not be the optimal.
In your case you can define:
test.i
%module test
%{
#include "test.h"
%}
%include "std_vector.i"
namespace std {
%template(Line) vector < int >;
%template(Array) vector < vector < int> >;
}
void print_array(std::vector< std::vector < int > > myarray);
test.h
#ifndef TEST_H__
#define TEST_H__
#include <stdio.h>
#include <vector>
void print_array(std::vector< std::vector < int > > myarray);
#endif /* TEST_H__ */
test.cpp
#include "test.h"
void print_array(std::vector< std::vector < int > > myarray)
{
for (int i=0; i<2; i++)
for (int j=0; j<2; j++)
printf("[%d][%d] = [%d]\n", i, j, myarray[i][j]);
}
If you run the following python code (I used python 2.6.5), you can see that the C++ function can access the python list:
>>> import test
>>> a = test.Array()
>>> a = [[0, 1], [2, 3]]
>>> test.print_array(a)
[0][0] = [0]
[0][1] = [1]
[1][0] = [2]
[1][1] = [3]