Permutation with backtraking from C to Python

Permutation with backtraking from C to Python - python

I have to do a program that gives all permutations of n numbers {1,2,3..n} using backtracking. I managed to do it in C, and it works very well, here is the code:
int st[25], n=4;
int valid(int k)
{
int i;
for (i = 1; i <= k - 1; i++)
if (st[k] == st[i])
return 0;
return 1;
}
void bktr(int k)
{
int i;
if (k == n + 1)
{
for (i = 1; i <= n; i++)
printf("%d ", st[i]);
printf("\n");
}
else
for (i = 1; i <= n; i++)
{
st[k] = i;
if (valid(k))
bktr(k + 1);
}
}
int main()
{
bktr(1);
return 0;
}
Now I have to write it in Python. Here is what I did:
st=[]
n=4
def bktr(k):
if k==n+1:
for i in range(1,n):
print (st[i])
else:
for i in range(1,n):
st[k]=i
if valid(k):
bktr(k+1)
def valid(k):
for i in range(1,k-1):
if st[k]==st[i]:
return 0
return 1
bktr(1)
I get this error:
list assignment index out of range
at st[k]==st[i].

Python has a "permutations" functions in the itertools module:
import itertools
itertools.permutations([1,2,3])
If you need to write the code yourself (for example if this is homework), here is the issue:
Python lists do not have a predetermined size, so you can't just set e.g. the 10th element to 3. You can only change existing elements or add to the end.
Python lists (and C arrays) also start at 0. This means you have to access the first element with st[0], not st[1].
When you start your program, st has a length of 0; this means you can not assign to st[1], as it is not the end.
If this is confusing, I recommend you use the st.append(element) method instead, which always adds to the end.
If the code is done and works, I recommend you head over to code review stack exchange because there are a lot more things that could be improved.

Related

Performance Discrepancy B/n C++ and Python for Project Euler

I'm experiencing a slightly bizarre performance discrepancy between two equatable programs and I cannot reason about the difference for any real reason.
I'm solving Project Euler Problem 46. Both code solutions (one in Python and one in Cpp) get the right answer. However, the python solution seems to be more performant, which is contradictory to what I was expecting.
Do not worry about the actual algorithm being optimal - all I care about is that they are two equatable programs. I'm sure there is a more optimal algorithm.
Python Solution
import math
import time
UPPER_LIMIT = 1000000
HIT_COUNT = 0
def sieveOfErato(number):
sieve = [True] * number
for i in xrange(2, int(math.ceil(math.sqrt(number)))):
if sieve[i]:
for j in xrange(i**2, number, i):
sieve[j] = False
primes = [i for i, val in enumerate(sieve) if i > 1 and val == True]
return set(primes)
def isSquare(number):
ans = math.sqrt(number).is_integer()
return ans
def isAppropriateGolbachNumber(number, possiblePrimes):
global HIT_COUNT
for possiblePrime in possiblePrimes:
if possiblePrime < number:
HIT_COUNT += 1
difference = number - possiblePrime
if isSquare(difference / 2):
return True
return False
if __name__ == '__main__':
start = time.time()
primes = sieveOfErato(UPPER_LIMIT)
answer = -1
for odd in xrange(3, UPPER_LIMIT, 2):
if odd not in primes:
if not isAppropriateGolbachNumber(odd, primes):
answer = odd
break
print('Hit Count: {}'.format(HIT_COUNT))
print('Loop Elapsed Time: {}'.format(time.time() - start))
print('Answer: {}'.format(answer))
C++ Solution
#include <iostream>
#include <unordered_set>
#include <vector>
#include <math.h>
#include <cstdio>
#include <ctime>
int UPPER_LIMIT = 1000000;
std::unordered_set<int> sieveOfErato(int number)
{
std::unordered_set<int> primes;
bool sieve[number+1];
memset(sieve, true, sizeof(sieve));
for(int i = 2; i * i <= number; i++)
{
if (sieve[i] == true)
{
for (int j = i*i; j < number; j+=i)
{
sieve[j] = false;
}
}
}
for(int i = 2; i < number; i++)
{
if (sieve[i] == true)
{
primes.insert(i);
}
}
return primes;
}
bool isPerfectSquare(const int& number)
{
int root(round(sqrt(number)));
return number == root * root;
}
int hitCount = 0;
bool isAppropriateGoldbachNumber(const int& number, const std::unordered_set<int>& primes)
{
int difference;
for (const auto& prime : primes)
{
if (prime < number)
{
hitCount++;
difference = (number - prime)/2;
if (isPerfectSquare(difference))
{
return true;
}
}
}
return false;
}
int main(int argc, char** argv)
{
std::clock_t start;
double duration;
start = std::clock();
std::unordered_set<int> primes = sieveOfErato(UPPER_LIMIT);
int answer = -1;
for(int odd = 3; odd < UPPER_LIMIT; odd+=2)
{
if (primes.find(odd) == primes.end())
{
if (!isAppropriateGoldbachNumber(odd, primes))
{
answer = odd;
break;
}
}
}
duration = (std::clock() - start) / (double) CLOCKS_PER_SEC;
std::cout << "Hit Count: " << hitCount << std::endl;
std::cout << std::fixed << "Loop Elapsed Time: " << duration << std::endl;
std::cout << "Answer: " << answer << std::endl;
}
I'm compiling my cpp code by g++ -std=c++14 file.cpp and then executing with just ./a.out.
On a couple of test runs just using the time command from the command line, I get:
Python
Hit Count: 128854
Loop Elapsed Time: 0.393740177155
Answer: 5777
real 0m0.525s
user 0m0.416s
sys 0m0.049s
C++
Hit Count: 90622
Loop Elapsed Time: 0.993970
Answer: 5777
real 0m1.027s
user 0m0.999s
sys 0m0.013s
Why would there be more hits in the python version and it still be returning more quickly? I would think that more hits, means more iterations, means slower (and it's in python). I'm guessing that there's just a performance blunder in my cpp code, but I haven't found it yet. Any ideas?

I concur with Kunal Puri's answer that a better algorithm and data-structure can improve performance, but it does not answer the core question: Why does the same algorithm, that uses the same data-structure, runs faster with python.
It all boils down to the difference between std::unordered_set and python's set. Note that the same C++ code with std::set runs faster than python's alternative, and if optimization is enabled (with -O2) then C++ code with std::set runs more than 10 times faster than python.
There are several works showing that, and why, std::unordered_set is broken performance-wise. For example you can watch C++Now 2018: You Can Do Better than std::unordered_map: New Improvements to Hash Table Performance. It seems that python does not suffer from these design flaws in its set.
One of the things that make std::unordered_set so poor is the big amount of indirections it mandates to simply reach an element. For example, during iteration, the iterator points to a bucket before the current bucket. Another thing to consider is the poorer cache locality. The set of python seems to prefer to retain the original order of elements, but the GCC's std::unordered_set tends to create a random order. This is the cause of the difference in HIT_COUNT between C++ and python. Once the code starts to use std::set then the HIT_COUNT becomes the same for C++ and python. Retaining the original order during iteration tends to improves the cache locality of nodes in a new process, since they are iterated in the same order as they are allocated (and two adjacent allocations, of a new process, have higher chance to be allocated in consecutive memory addresses).

Apart from compiler optimization as suggested by DYZ, I have some more observations regarding optimization.
1) Use std::vector instead of std::unordered_set.
In your code, you are doing this:
std::unordered_set<int> sieveOfErato(int number)
{
std::unordered_set<int> primes;
bool sieve[number+1];
memset(sieve, true, sizeof(sieve));
for(int i = 2; i * i <= number; i++)
{
if (sieve[i] == true)
{
for (int j = i*i; j < number; j+=i)
{
sieve[j] = false;
}
}
}
for(int i = 2; i < number; i++)
{
if (sieve[i] == true)
{
primes.insert(i);
}
}
return primes;
}
I don't see any reason of using std::unordered_set here. Instead, you could do this:
std::vector<int> sieveOfErato(int number)
{
bool sieve[number+1];
memset(sieve, true, sizeof(sieve));
int numPrimes = 0;
for(int i = 2; i * i <= number; i++)
{
if (sieve[i] == true)
{
for (int j = i*i; j < number; j+=i)
{
sieve[j] = false;
}
numPrimes++;
}
}
std::vector<int> primes(numPrimes);
int j = 0;
for(int i = 2; i < number; i++)
{
if (sieve[i] == true)
{
primes[j++] = i;
}
}
return primes;
}
As far as find() is concerned, you may do this:
int j = 0;
for(int odd = 3; odd < UPPER_LIMIT; odd+=2)
{
while (j < primes.size() && primes[j] < odd) {
j++;
}
if (primes[j] != odd)
{
if (!isAppropriateGoldbachNumber(odd, primes))
{
answer = odd;
break;
}
}
}
2) Pre Compute perfect squares in a std::vector before hand instead of calling sqrt always.

Transform python yield into c++

I have a piece of python code I need to use in c++. The algorithm is a recursion that uses yield.
Here is the python function:
def getSubSequences(self, s, minLength=1):
if len(s) >= minLength:
for i in range(minLength, len(s) + 1):
for p in self.getSubSequences(s[i:], 1 if i > 1 else 2):
yield [s[:i]] + p
elif not s:
yield []
and here is my attempt so far
vector< vector<string> > getSubSequences(string number, unsigned int minLength=1) {
if (number.length() >= minLength) {
for (unsigned int i=minLength; i<=number.length()+1; i++) {
string sub = "";
if (i <= number.length())
sub = number.substr(i);
vector< vector<string> > res = getSubSequences(sub, (i > 1 ? 1 : 2));
vector< vector<string> > container;
vector<string> tmp;
tmp.push_back(number.substr(0, i));
container.push_back(tmp);
for (unsigned int j=0; j<res.size(); j++) {
container.push_back(res.at(j));
return container;
}
}
} else if (number.length() == 0)
return vector< vector<string> >();
}
Unfortunately I get a segmentation fault when executing it. Is this even the right attempt or is there an easier way to do this? The data structures are not fixed I just need the same result as I get in the python code!

The loops in your above code snippets are not equivalent.
The Python code has
for i in range(minLength, len(s) + 1):
The C++ code has
for (unsigned int i=minLength; i<=number.length()+1; i++) {
So the Python loop terminates one iteration sooner than the C++ one.
The question has really nothing to do with yield. I think you should print stuff out from implementations, in these cases, and study them. In this case, it would have shown that the two algorithms diverge.

Is there an equivalent to a nested recursive function in C?

First of all, I know that nested functions are not supported by the C standard.
However, it's often very useful, in other languages, to define an auxiliary recursive function that will make use of data provided by the outer function.
Here is an example, computing the number of solutions of the N-queens problem, in Python. It's easy to write the same in Lisp, Ada or Fortran for instance, which all allow some kind of nested function.
def queens(n):
a = list(range(n))
u = [True]*(2*n - 1)
v = [True]*(2*n - 1)
m = 0
def sub(i):
nonlocal m
if i == n:
m += 1
else:
for j in range(i, n):
p = i + a[j]
q = i + n - 1 - a[j]
if u[p] and v[q]:
u[p] = v[q] = False
a[i], a[j] = a[j], a[i]
sub(i + 1)
u[p] = v[q] = True
a[i], a[j] = a[j], a[i]
sub(0)
return m
Now my question: is there a way to do something like this in C? I would think of two solutions: using globals or passing data as parameters, but they both look rather unsatisfying.
There is also a way to write this as an iterative program, but it's clumsy:actually, I first wrote the iterative solution in Fortran 77 for Rosetta Code and then wanted to sort out this mess. Fortran 77 does not have recursive functions.
For those who wonder, the function manages the NxN board as a permutation of [0, 1 ... N-1], so that queens are alone on lines and columns. The function is looking for all permutations that are also solutions of the problem, starting to check the first column (actually nothing to check), then the second, and recursively calling itself only when the first i columns are in a valid configuration.

Of course. You need to simulate the special environment in use by your nested function, as static variables on the module level. Declare them above your nested function.
To not mess things up, you put this whole thing into a separate module.

Editor's Note: This answer was moved from the content of a question edit, it is written by the Original Poster.
Thanks all for the advice. Here is a solution using a structure passed as an argument. This is roughly equivalent to what gfortran and gnat do internally to deal with nested functions. The argument i could also be passed in the structure, by the way.
The inner function is declared static so as to help compiler optimizations. If it's not recursive, the code can then be integrated to the outer function (tested with GCC on a simple example), since the compiler knows the function will not be called from the "outside".
#include <stdio.h>
#include <stdlib.h>
struct queens_data {
int n, m, *a, *u, *v;
};
static void queens_sub(int i, struct queens_data *e) {
if(i == e->n) {
e->m++;
} else {
int p, q, j;
for(j = i; j < e->n; j++) {
p = i + e->a[j];
q = i + e->n - 1 - e->a[j];
if(e->u[p] && e->v[q]) {
int k;
e->u[p] = e->v[q] = 0;
k = e->a[i];
e->a[i] = e->a[j];
e->a[j] = k;
queens_sub(i + 1, e);
e->u[p] = e->v[q] = 1;
k = e->a[i];
e->a[i] = e->a[j];
e->a[j] = k;
}
}
}
}
int queens(int n) {
int i;
struct queens_data s;
s.n = n;
s.m = 0;
s.a = malloc((5*n - 2)*sizeof(int));
s.u = s.a + n;
s.v = s.u + 2*n - 1;
for(i = 0; i < n; i++) {
s.a[i] = i;
}
for(i = 0; i < 2*n - 1; i++) {
s.u[i] = s.v[i] = 1;
}
queens_sub(0, &s);
free(s.a);
return s.m;
}
int main() {
int n;
for(n = 1; n <= 16; n++) {
printf("%d %d\n", n, queens(n));
}
return 0;
}

unusual speed difference between python and c++ programs

I wrote same program in C++ and Python. In Python it takes unusual amount of time(Actually I did't get answer in it). Can anybody explain why is that?
C++ code:
#include<iostream>
using namespace std;
int main(){
int n = 1000000;
int *solutions = new int[n];
for (int i = 1; i <= n; i++){
solutions[i] = 0;
}
for (int v = 1; v <= n; v++){
for (int u = 1; u*v <= n; u++){
if ((3 * v>u) & (((u + v) % 4) == 0) & (((3 * v - u) % 4) == 0)){
solutions[u*v]++;
}
}
}
int count = 0;
for (int i = 1; i < n; i++){
if ((solutions[i])==10)
count += 1;
}
cout << count;
}
Python code:
n=1000000
l=[0 for x in range(n+1)]
for u in range(1,n+1):
v=1
while u*v<n+1:
if (((u+v)%4)==0) and (((3*v-u)%4)==0) and (3*v>u):
l[u*v]+=1
v+=1
l.count(10)

You can try optimizing this loop, for example make it a single block with no ifs in it, or otherwise use a module in C.
C++ compiler does optimalizations Python runtime can't, so with pure interpreter you will never get performance being anything close.
And 1M interactions is a lot, I woulnt start with any interpreter in that range, you'd be better doing it in a browser and JavaScript.

Returning C arrays into python scope from scipy's weave.inline

I am using scipy's weave.inline to perform computationally expensive tasks. I have problems returning an one-dimensional array back into the python scope. Weave.inline uses a special argument called "return_val" for the purpose of returning values back into the python scope.
The following example returning an integer value works well:
>>> from scipy.weave import inline
>>> print inline(r'''int N = 10; return_val = N;''')
10
However the following example, which indeed compiles without prompting an error, does not return the array i would expect:
>>> from scipy.weave import inline
>>> code =\
r'''
int* pairs;
int lenght = 0;
for (int i=0;i<N;i++){
lenght += 1;
pairs = (int *)malloc(sizeof(int)*lenght);
pairs[i] = i;
std::cout << pairs[i] << std::endl;
}
return_val = pairs;
'''
>>> N = 5
>>> R = inline(code,['N'])
>>> print "RETURN_VAL:",R
0
1
2
3
4
RETURN_VAL: 1
I need to reallocate the size of the array "pairs" dynamically which is why I can't pass a numpy.array or python list per se.

All you need to do is use the raw python c-api calls, or if you're looking for something a bit more convenient, the built in scipy weave wrappers.
No guarantees about leaks or efficiency, but it should look something a bit like this:
from scipy.weave import inline
code = r'''
py::list ret;
for(int i = 0; i < N; i++) {
py::list item;
for(int j = 0; j < i; j++) {
item.append(j);
}
ret.append(item);
}
return_val = ret;
'''
N = 5
R = inline(code,['N'])
print R

If you absolutely don't know the size of the output array in advance, you must create it in your inline code. I'm pretty sure that your array allocated by using malloc will result in leaked memory since you have no way of controlling when this memory is to be freed.
The solution is to create a numpy array, fill it with your function's results and return it.
import scipy.weave
code = r"""
npy_intp dims[1] = {n};
PyObject* out_array = PyArray_SimpleNew(1, dims, NPY_DOUBLE);
double* data = (double*) ((PyArrayObject*) out_array)->data;
for (int i=0; i<n; ++i) data[i] = i;
return_val = out_array;
Py_XDECREF(out_array);
"""
n = 5
out_array = scipy.weave.inline(code, ["n"])
print "Array:", out_array

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.