Julia code not finishing while Python code does

Julia code not finishing while Python code does - python

I am very new to Julia and was trying to implement/rewrite some of my previous Python code as practice. I was using the Project Euler problem 25 as practice
In Python I have
def fibonacci(N):
"""Returns the Nth Fibonacci Number"""
F = [0, 1]
i = 0
while i <= N-2:
F_new = F[i] + F[i+1]
F.append(F_new)
i += 1
return F[N]
N = 0
x = 1000
while len(str(fibonacci(N))) <= x:
if len(str(fibonacci(N))) == x:
print(N)
break
N = N + 1
Which runs and gives me the correct answer in about 6.5 seconds. When trying to do this in Julia below
function fib(N)
F = [0, 1]
global i = 1
while i <= N-2
F_new = F[i] + F[i+1]
append!(F, F_new)
global i += 1
end
return F[N]
end
N = 1
x = 1000
while length(string(fib(N))) <= x
if length(string(fib(N))) == x
print(N-1)
break
end
global N += 1
end
The code seems to run "forever". However in the Julia code only when x<= 20 will the code finish and produce the correct answer. In the Julia code when x>20 the program never ends.
I'm not sure where something could go wrong if it runs for all values below 21? Could somebody explain where the error is happening and why?

Python integers are by default unbounded in size and will grow as needed. Julia on the other hand will default to a signed 64 bit integer if on a 64 bit system. (See docs) This begins to overflow when trying to calculate values above around 19 digits long, hence why this starts around x=20. In order to get the same behavior in Julia, you should use the BigInt type for any values or arguments which can get above this size.

The main problem with your code is what #duckboycool has described. The second advice is to always write functions in Julia. Read the Julia performance tips page for a good start.
Note that you can make the function by #Bill 2X faster by removing the unnecessary if like this:
function test(x = 1000)
N = 0
while ndigits(fib(N)) < x
N += 1
end
return N
end
But if you really want a 16000X faster Julia version, then you can do this:
function euler25()
limit = big(10)^999
a, b = big(1), big(1)
N = 2
while b <= limit
a, b = b, a + b
N += 1
end
return N
end
#btime euler25() = 4782
377.700 μs (9573 allocations: 1.15 MiB)
This runs in 377 μs, because we avoid calculating fib(N) at every step from the beginning. And instead of comparing with the length of a string of the output at each iteration, we just compare with 10^999.

In addition to the earlier answer, note that you should avoid globals if looking at performance, so this is much faster than your global i and x code:
function fib(N)
F = [big"0", big"1"]
for i in 1:N-2
F_new = F[i] + F[i+1]
push!(F, F_new)
end
return F[N]
end
function test(x = 1000)
N = 1
while length(string(fib(N))) <= x
if length(string(fib(N))) == x
print(N-1)
break
end
N += 1
end
end
test()

#AboAmmar shows probably the best "normal" way of writing this. But if you want something even a bit more optimized, you can use in-place BigInt calls. I'm not sure whether I would recommend this, but it's nice to be aware of it.
using Base.GMP.MPZ: add!, set!
function euler25_(limit=big(10)^999)
a, b = big(1), big(1)
N = 2
c = big(0)
while b <= limit
add!(c, a, b)
set!(a, b)
set!(b, c)
N += 1
end
return N
end
This uses the special BigInt functions in the GMP.MPZ library, and writes values in-place, avoiding most of the allocations, and running 2.5x faster on my laptop.

Related

Fix the solution of "Binary period"

I found this task and completely stuck with its solution.
A non-empty zero-indexed string S consisting of Q characters is given. The period of this string is the smallest positive integer P such that:
P ≤ Q / 2 and S[K] = S[K+P] for 0 ≤ K < Q − P.
For example, 7 is the period of “abracadabracadabra”. A positive integer M is the binary period of a positive integer N if M is the period of the binary representation of N.
For example, 1651 has the binary representation of "110011100111". Hence, its binary period is 5. On the other hand, 102 does not have a binary period, because its binary representation is “1100110” and it does not have a period.
Consider above scenarios & write a function in Python which will accept an integer N as the parameter. Given a positive integer N, the function returns the binary period of N or −1 if N does not have a binary period.
The attached code is still incorrect on some inputs (9, 11, 13, 17 etc). The goal is to find and fix the bugs in the implementation. You can modify at most 2 line.
def binary_period(n):
d = [0] * 30
l = 0
while n > 0:
d[l] = n % 2
n //= 2
l += 1
for p in range(1, 1 + l):
ok = True
for i in range(l - p):
if d[i] != d[i + p]:
ok = False
break
if ok:
return p
return -1

I was given this piece of code in an interview.
The aim of the exercice is to see where lies the bug.
As an input of the function, you will type the integer to see the binary period of it. As an example solution(4) will give you a binary number of 0011.
However, the question is the following: What is the bug?
The bug in this occasion is not some crash and burn code, rather a behavior that should happen and in the code, do not happen.
It is known as a logical error in the code. Logical error is the error when code do not break but doesn't fullfill the requirements.
Using a brute force on the code will not help as there are a billion possibilities.
However if you run the code, let's say from solutions(1) to solutions(100), you will see that the code runs without any glitch. Yet if you are looking at the code, it should return -1 if there are errors.
The code is not givin any -1 even if you run solutions to a with bigger number like 10000.
The bug here lies in the -1 that is not being triggered.
So let's go step by step on the code.
Could it be the while part?
while n > 0:
d[l] = n % 2
n //= 2
l += 1
If you look at the code, it is doing what it should be doing, changing the number given to a binary number, even if it is doing from a backward position. Instead of having 1011, you have 1101 but it does the job.
The issue lies rather in that part
for p in range(1, 1 + l):
ok = True
for i in range(l - p):
if d[i] != d[i + p]:
ok = False
break
if ok:
return p
return -1
It is not returning -1.
if you put some print on some part of the code like this, this would give you this
for p in range(1, 1 + l):
ok = True
for i in range(l - p):
print('l, which works as an incrementor is substracted to p of the first loop',p,l-p)
if d[i] != d[i + p]:
ok = False
break
if ok:
return p
return -1
If you run the whole script, actually, you can see that it is never ending even if d[i] is not equal anymore to d[i+p].
But why?
The reason is because l, the incrementor was built on an integer division. Because of that, you need to do a 1+l//2.
Which gives you the following
def solution(n):
d = [0] * 30
l = 0
while n > 0:
d[l] = n % 2
n //= 2
l += 1
for p in range(1, 1 + l//2): #here you put l//2
ok = True
print('p est ',p)
for i in range(l - p):
if d[i] != d[i + p]:
ok = False
break
if ok:
return
Now if you run the code with solutions(5) for example, the bug should be fixed and you should have -1.
Addendum:
This test is a difficult one with a not easy algorithm to deal with in very short time, with variables that does not make any sense.
First step would be to ask the following questions:
What is the input of the algorithm? In this case, it is an integer.
What is the expected output? In this case, a -1
Is it a logical error or a crash and burn kind of error? In this case, it is a logical error.
These step-by-step (heuristic) will set you on the right direction to debug a problem.

Following up Andy's solution and checking #hdlopez comment, there is a border case when passing int.MaxVal=2147483647
and if you do not increase the array size to 31 (instead of 30). The function throws an index out of range, so two places need to be modified:
1- int[] d = new int[31]; //changed 30 to 31 (unsigned integer)
2- for (p = 1; p < 1 + l / 2; ++p) //added division to l per statement, P ≤ Q / 2

Program never completes when loop is set to 4000000

fib1 = 1
fib2 = 2
i = 0
sum = 0
while i < 3999998:
fibn = fib1 + fib2
fib1 = fib2
fib2 = fibn
i += 1
if fibn % 2 == 0:
sum = sum + fibn
print(sum + 2)
The challenge is to add even Fibonacci numbers under 4000000. It works for small limits say 10 numbers. But goes on forever when set for 4000000.
Code is in Python

Yes, there are inefficiencies in your code, but the biggest one is that you're mistaken about what you're computing.
At each iteration i increases by one, and you are checking at each step whether i < 3999998. You are effectively finding the first 4 million fibonacci numbers.
You should change your loop condition to while fib2 < 3999998.
A couple of other minor optimisations. Leverage python's swapping syntax x, y = y, x and its sum function. Computing the sum once over a list is slightly faster then summing them up successively over a loop.
a, b = 1, 2
fib = []
while b < 3999998:
a, b = b, a + b
if b % 2 == 0:
fib.append(b)
sum(fib) + 2
This runs in 100000 loops, best of 3: 7.51 µs per loop, a whopping 3 microseconds faster than your current code (once you fix it, that is).

You are computing the first 4 million fibonacci numbers. It's going to take a while. It took me almost 5 minutes to compute the result, which was about 817 KB of digits, after I replaced fibn % 2 == 0 with fibn & 1 == 0 - an optimization that makes a big difference on such large numbers.
In other words, your code will eventually finish - it will just take a long time.
Update: your version finished after 42 minutes.

Power Function Python

This function is calculates the value of a^b and returns it.My question is if m=log(b)
the best case scenario is that it does m+1 interactions
but what is the worst case? and how many times it enters the while loop?
def power(a,b):
result=1
while b>0: # b is nonzero
if b % 2 == 1:
result=result*a
a=a*a
b = b//2
return result

As #EliSadoff stated in a comment, you need an initial value of result in your function. Insert the line
result = 1
just after the def line. The code then works, and this is a standard way to implicitly use the binary representation of b to quickly get the exponentiation. (The loop invariant is that the value of result * a ** b remains constant, which shows the validity of this algorithm.)
The worst case is where your if b % 2 line is executed every time through the while loop. This will happen whenever b is one less than a power of 2, so every digit in bs binary representation is one. The while loop condition while b>0 is still checked only m+1 times, but each loop now has a little more to do.
There are several ways to speed up your code. Use while b rather than while b>0 and if b & 1 rather than if b % 2 = 1. Use result *= a rather than result = result*a and a *= a rather than a = a*a and b >>= 1 rather than b = b // 2. These are fairly minor improvements, of course. The only way to speed up the loop further is to use non-structured code, which I believe isn't possible in Python. (There is one more modification of a than is necessary but there is no good way to prevent that without a jump into a loop.) There are some variations on this code, such as an inner loop to keep modifying a and b as long as b is even, but that is not always faster.
The final code is then
def power(a, b):
"""Return a ** b, assuming b is a nonnegative integer"""
result = 1
while b:
if b & 1:
result *= a
a *= a
b >>= 1
return result
I cleaned up your code a little to better fit PEP8 (Python style standards). Note that there is no error checking in your code, especially to ensure that b is a nonnegative integer. I believe my code gets an infinite loop if b is a negative integer while yours returns a false result. So please do that error check! Also note that your code says power(0, 0) == 1 which is pretty standard for such a function but still takes some people by surprise.

Python - How to improve efficiency of complex recursive function?

In this video by Mathologer on, amongst other things, infinite sums there are 3 different infinite sums shown at 9:25, when the video freezes suddenly and an elephant diety pops up, challenging the viewer to find "the probable values" of the expressions. I wrote the following script to approximate the last of the three (i.e. 1 + 3.../2...) with increasing precision:
from decimal import Decimal as D, getcontext # for accurate results
def main(c): # faster code when functions defined locally (I think)
def run1(c):
c += 1
if c <= DEPTH:
return D(1) + run3(c)/run2(c)
else:
return D(1)
def run2(c):
c += 1
if c <= DEPTH:
return D(2) + run2(c)/run1(c)
else:
return D(2)
def run3(c):
c += 1
if c <= DEPTH:
return D(3) + run1(c)/run3(c)
else:
return D(3)
return run1(c)
getcontext().prec = 10 # too much precision isn't currently necessary
for x in range(1, 31):
DEPTH = x
print(x, main(0))
Now this is working totally fine for 1 <= x <= 20ish, but it starts taking an eternity for each result after that. I do realize that this is due to the exponentially increasing number of function calls being made at each DEPTH level. It is also clear that I won't be able to calculate the series comfortably up to an arbitrary point. However, the point at which the program slows down is too early for me to clearly identify the limit the series it is converging to (it might be 1.75, but I need more DEPTH to be certain).
My question is: How do I get as much out of my script as possible (performance-wise)?
I have tried:
1. finding the mathematical solution to this problem. (No matching results)
2. finding ways to optimize recursive functions in general. According to multiple sources (e.g. this), Python doesn't optimize tail recursion by default, so I tried switching to an iterative style, but I ran out of ideas on how to accomplish this almost instantly...
Any help is appreciated!
NOTE: I know that I could go about this mathematically instead of "brute-forcing" the limit, but I want to get my program running well, now that I've started...

You can store the results of the run1, run2 and run3 functions in arrays to prevent them from being recalculated every time, since in your example, main(1) calls run1(1), which calls run3(2) and run2(2), which in turn call run1(3), run2(3), run1(3) (again) and run3(3), and so on.
You can see that run1(3) is being called evaluated twice, and this only gets worse as the number increases; if we count the number of times each function is called, those are the results:
run1 run2 run3
1 1 0 0
2 0 1 1
3 1 2 1
4 3 2 3
5 5 6 5
6 11 10 11
7 21 22 21
8 43 42 43
9 85 86 85
...
20 160,000 each (approx.)
...
30 160 million each (approx.)
This is actually a variant of a Pascal triangle, and you could probably figure out the results mathematically; but since here you asked for a non mathematical optimization, just notice how the number of calls increases exponentially; it doubles at each iteration. This is even worse since each call will generate thousands of subsequent calls with higher values, which is what you want to avoid.
Therefore what you want to do is store the value of each call, so that the function does not need to be called a thousand times (and itself make thousands more calls) to always get the same result. This is called memoization.
Here is an example solution in pseudo code:
before calling main, declare the arrays val1, val2, val3, all of size DEPTH, and fill them with -1
function run1(c) # same thing for run2 and run3
c += 1
if c <= DEPTH
local3 = val3(c) # read run3(c)
if local3 is -1 # if run3(c) hasn't been computed yet
local3 = run3(c) # we compute it
val3(c) = local3 # and store it into the array
local2 = val2(c) # same with run2(c)
if local2 is -1
local2 = run2(c)
val2(c) = local2
return D(1) + local3/local2 # we use the value we got from the array or from the computation
else
return D(1)
Here I use -1 since your functions seem to only generate positive numbers, and -1 is an easy placeholder for the empty cells. In other cases you might have to use an object as Cabu below me did. I however think this would be slower due to the cost of retrieving properties in an object versus reading an array, but I might be wrong about that. Either way, your code should be much, much faster with it is now, with a cost of O(n) instead of O(2^n).
This would technically allow your code to run forever at a constant speed, but the recursion will actually cause an early stack overflow. You might still be able to get to a depth of several thousands before that happens though.
Edit: As ShadowRanger added in the comments, you can keep your original code and simply add #lru_cache(maxsize=n) before each of your run1, run2 and run3 functions, where n is one of the first powers of two above DEPTH (for example, 32 if depth is 25). This might require an import directive to work.

With some memoization, You could get up to the stack overflow:
from decimal import Decimal as D, getcontext # for accurate results
def main(c): # faster code when functions defined locally (I think)
mrun1 = {} # store partial results of run1, run2 and run3
# This have not been done in the as parameter of the
# run function to be able to reset them easily
def run1(c):
if c in mrun1: # if partial result already computed, return it
return mrun1[c]
c += 1
if c <= DEPTH:
v = D(1) + run3(c) / run2(c)
else:
v = D(1)
mrun1[c] = v # else store it and return the value
return v
def run2(c):
if c in mrun2:
return mrun2[c]
c += 1
if c <= DEPTH:
v = D(2) + run2(c) / run1(c)
else:
v = D(2)
mrun2[c] = v
return v
def run3(c):
if c in mrun3:
return mrun3[c]
c += 1
if c <= DEPTH:
v = D(3) + run1(c) / run3(c)
else:
v = D(3)
mrun3[c] = v
return v
return run1(c)
getcontext().prec = 150 # too much precision isn't currently necessary
for x in range(1, 997):
DEPTH = x
print(x, main(0))
Python will stack overflow if you go over 997.

Fast way to place bits for puzzle

There is a puzzle which I am writing code to solve that goes as follows.
Consider a binary vector of length n that is initially all zeros. You choose a bit of the vector and set it to 1. Now a process starts that sets the bit that is the greatest distance from any 1 bit to $1$ (or an arbitrary choice of furthest bit if there is more than one). This happens repeatedly with the rule that no two 1 bits can be next to each other. It terminates when there is no more space to place a 1 bit. The goal is to place the initial 1 bit so that as many bits as possible are set to 1 on termination.
Say n = 2. Then wherever we set the bit we end up with exactly one bit set.
For n = 3, if we set the first bit we get 101 in the end. But if we set the middle bit we get 010 which is not optimal.
For n = 4, whichever bit we set we end up with two set.
For n = 5, setting the first gives us 10101 with three bits set in the end.
For n = 7, we need to set the third bit to get 1010101 it seems.
I have written code to find the optimal value but it does not scale well to large n. My code starts to get slow around n = 1000 but I would like to solve the problem for n around 1 million.
#!/usr/bin/python
from __future__ import division
from math import *
def findloc(v):
count = 0
maxcount = 0
id = -1
for i in xrange(n):
if (v[i] == 0):
count += 1
if (v[i] == 1):
if (count > maxcount):
maxcount = count
id = i
count = 0
#Deal with vector ending in 0s
if (2*count >= maxcount and count >= v.index(1) and count >1):
return n-1
#Deal with vector starting in 0s
if (2*v.index(1) >= maxcount and v.index(1) > 1):
return 0
if (maxcount <=2):
return -1
return id-int(ceil(maxcount/2))
def addbits(v):
id = findloc(v)
if (id == -1):
return v
v[id] = 1
return addbits(v)
#Set vector length
n=21
max = 0
for i in xrange(n):
v = [0]*n
v[i] = 1
v = addbits(v)
score = sum([1 for j in xrange(n) if v[j] ==1])
# print i, sum([1 for j in xrange(n) if v[j] ==1]), v
if (score > max):
max = score
print max

Latest answer (O(log n) complexity)
If we believe the conjecture by templatetypedef and Aleksi Torhamo (update: proof at the end of this post), there is a closed form solution count(n) calculable in O(log n) (or O(1) if we assume logarithm and bit shifting is O(1)):
Python:
from math import log
def count(n): # The count, using position k conjectured by templatetypedef
k = p(n-1)+1
count_left = k/2
count_right = f(n-k+1)
return count_left + count_right
def f(n): # The f function calculated using Aleksi Torhamo conjecture
return max(p(n-1)/2 + 1, n-p(n-1))
def p(n): # The largest power of 2 not exceeding n
return 1 << int(log(n,2)) if n > 0 else 0
C++:
int log(int n){ // Integer logarithm, by counting the number of leading 0
return 31-__builtin_clz(n);
}
int p(int n){ // The largest power of 2 not exceeding n
if(n==0) return 0;
return 1<<log(n);
}
int f(int n){ // The f function calculated using Aleksi Torhamo conjecture
int val0 = p(n-1);
int val1 = val0/2+1;
int val2 = n-val0;
return val1>val2 ? val1 : val2;
}
int count(int n){ // The count, using position k conjectured by templatetypedef
int k = p(n-1)+1;
int count_left = k/2;
int count_right = f(n-k+1);
return count_left + count_right;
}
This code can calculate the result for n=100,000,000 (and even n=1e24 in Python!) correctly in no time1.
I have tested the codes with various values for n (using my O(n) solution as the standard, see Old Answer section below), and they still seem correct.
This code relies on the two conjectures by templatetypedef and Aleksi Torhamo2. Anyone wants to proof those? =D (Update 2: PROVEN)
1By no time, I meant almost instantly
2The conjecture by Aleksi Torhamo on f function has been empirically proven for n<=100,000,000
Old answer (O(n) complexity)
I can return the count of n=1,000,000 (the result is 475712) in 1.358s (in my iMac) using Python 2.7. Update: It's 0.198s for n=10,000,000 in C++. =)
Here is my idea, which achieves O(n) time complexity.
The Algorithm
Definition of f(n)
Define f(n) as the number of bits that will be set on bitvector of length n, assuming that the first and last bit are set (except for n=2, where only the first or last bit is set). So we know some values of f(n) as follows:
f(1) = 1
f(2) = 1
f(3) = 2
f(4) = 2
f(5) = 3
Note that this is different from the value that we are looking for, since the initial bit might not be at the first or last, as calculated by f(n). For example, we have f(7)=3 instead of 4.
Note that this can be calculated rather efficiently (amortized O(n) to calculate all values of f up to n) using the recurrence relation:
f(2n) = f(n)+f(n+1)-1
f(2n+1) = 2*f(n+1)-1
for n>=5, since the next bit set following the rule will be the middle bit, except for n=1,2,3,4. Then we can split the bitvector into two parts, each independent of each other, and so we can calculate the number of bits set using f( floor(n/2) ) + f( ceil(n/2) ) - 1, as illustrated below:
n=11 n=13
10000100001 1000001000001
<----> <----->
f(6)<----> f(7) <----->
f(6) f(7)
n=12 n=14
100001000001 10000010000001
<----> <----->
f(6)<-----> f(7) <------>
f(7) f(8)
we have the -1 in the formula to exclude the double count of the middle bit.
Now we are ready to count the solution of original problem.
Definition of g(n,i)
Define g(n,i) as the number of bits that will be set on bitvector of length n, following the rules in the problem, where the initial bit is at the i-th bit (1-based). Note that by symmetry the initial bit can be anywhere from the first bit up to the ceil(n/2)-th bit. And for those cases, note that the first bit will be set before any bit in between the first and the initial, and so is the case for the last bit. Therefore the number of bit set in the first partition and the second partition is f(i) and f(n+1-i) respectively.
So the value of g(n,i) can be calculated as:
g(n,i) = f(i) + f(n+1-i) - 1
following the idea when calculating f(n).
Now, to calculate the final result is trivial.
Definition of g(n)
Define g(n) as the count being looked for in the original problem. We can then take the maximum of all possible i, the position of initial bit:
g(n) = maxi=1..ceil(n/2)(f(i) + f(n+1-i) - 1)
Python code:
import time
mem_f = [0,1,1,2,2]
mem_f.extend([-1]*(10**7)) # This will take around 40MB of memory
def f(n):
global mem_f
if mem_f[n]>-1:
return mem_f[n]
if n%2==1:
mem_f[n] = 2*f((n+1)/2)-1
return mem_f[n]
else:
half = n/2
mem_f[n] = f(half)+f(half+1)-1
return mem_f[n]
def g(n):
return max(f(i)+f(n+1-i)-1 for i in range(1,(n+1)/2 + 1))
def main():
while True:
n = input('Enter n (1 <= n <= 10,000,000; 0 to stop): ')
if n==0: break
start_time = time.time()
print 'g(%d) = %d, in %.3fs' % (n, g(n), time.time()-start_time)
if __name__=='__main__':
main()
Complexity Analysis
Now, the interesting thing is, what is the complexity of calculating g(n) with the method described above?
We should first note that we iterate over n/2 values of i, the position of initial bit. And in each iteration we call f(i) and f(n+1-i). Naive analysis will lead to O(n * O(f(n))), but actually we used memoization on f, so it's much faster than that, since each value of f(i) is calculated only once, at most. So the complexity is actually added by the time required to calculate all values of f(n), which would be O(n + f(n)) instead.
So what's the complexity of initializing f(n)?
We can assume that we precompute every value of f(n) first before calculating g(n). Note that due to the recurrence relation and the memoization, generating the whole values of f(n) takes O(n) time. And the next call to f(n) will take O(1) time.
So, the overall complexity is O(n+n) = O(n), as evidenced by this running time in my iMac for n=1,000,000 and n=10,000,000:
> python max_vec_bit.py
Enter n (1 <= n <= 10,000,000; 0 to stop): 1000000
g(1000000) = 475712, in 1.358s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
>
> <restarted the program to remove the effect of memoization>
>
> python max_vec_bit.py
Enter n (1 <= n <= 10,000,000; 0 to stop): 10000000
g(10000000) = 4757120, in 13.484s
Enter n (1 <= n <= 10,000,000; 0 to stop): 6745231
g(6745231) = 3145729, in 3.072s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
And as a by-product of memoization, the calculation of lesser value of n will be much faster after the first call to large n, as you can also see in the sample run. And with language better suited for number crunching such as C++, you might get significantly faster running time
I hope this helps. =)
The code using C++, for performance improvement
The result in C++ is about 68x faster (measured by clock()):
> ./a.out
Enter n (1 <= n <= 10,000,000; 0 to stop): 1000000
g(1000000) = 475712, in 0.020s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
>
> <restarted the program to remove the effect of memoization>
>
> ./a.out
Enter n (1 <= n <= 10,000,000; 0 to stop): 10000000
g(10000000) = 4757120, in 0.198s
Enter n (1 <= n <= 10,000,000; 0 to stop): 6745231
g(6745231) = 3145729, in 0.047s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
Code in C++:
#include <cstdio>
#include <cstring>
#include <ctime>
int mem_f[10000001];
int f(int n){
if(mem_f[n]>-1)
return mem_f[n];
if(n%2==1){
mem_f[n] = 2*f((n+1)/2)-1;
return mem_f[n];
} else {
int half = n/2;
mem_f[n] = f(half)+f(half+1)-1;
return mem_f[n];
}
}
int g(int n){
int result = 0;
for(int i=1; i<=(n+1)/2; i++){
int cnt = f(i)+f(n+1-i)-1;
result = (cnt > result ? cnt : result);
}
return result;
}
int main(){
memset(mem_f,-1,sizeof(mem_f));
mem_f[0] = 0;
mem_f[1] = mem_f[2] = 1;
mem_f[3] = mem_f[4] = 2;
clock_t start, end;
while(true){
int n;
printf("Enter n (1 <= n <= 10,000,000; 0 to stop): ");
scanf("%d",&n);
if(n==0) break;
start = clock();
int result = g(n);
end = clock();
printf("g(%d) = %d, in %.3fs\n",n,result,((double)(end-start))/CLOCKS_PER_SEC);
}
}
Proof
note that for the sake of keeping this answer (which is already very long) simple, I've skipped some steps in the proof
Conjecture of Aleksi Torhamo on the value of f
For `n>=1`, prove that:
f(2n+k) = 2n-1+1 for k=1,2,…,2n-1 ...(1)
f(2n+k) = k for k=2n-1+1,…,2n ...(2)
given f(0)=f(1)=f(2)=1
The result above can be easily proven using induction on the recurrence relation, by considering the four cases:
Case 1: (1) for even k
Case 2: (1) for odd k
Case 3: (2) for even k
Case 4: (2) for odd k
Suppose we have the four cases proven for n. Now consider n+1.
Case 1:
f(2n+1+2i) = f(2n+i) + f(2n+i+1) - 1, for i=1,…,2n-1
= 2n-1+1 + 2n-1+1 - 1
= 2n+1
Case 2:
f(2n+1+2i+1) = 2*f(2n+i+1) - 1, for i=0,…,2n-1-1
= 2*(2n-1+1) - 1
= 2n+1
Case 3:
f(2n+1+2i) = f(2n+i) + f(2n+i+1) - 1, for i=2n-1+1,…,2n
= i + (i+1) - 1
= 2i
Case 4:
f(2n+1+2i+1) = 2*f(2n+i+1) - 1, for i=2n-1+1,…,2n-1
= 2*(i+1) - 1
= 2i+1
So by induction the conjecture is proven.
Conjecture of templatetypedef on the best position
For n>=1 and k=1,…,2n, prove that g(2n+k) = g(2n+k, 2n+1)
That is, prove that placing the first bit on the 2n+1-th position gives maximum number of bits set.
The proof:
First, we have
g(2n+k,2n+1) = f(2n+1) + f(k-1) - 1
Next, by the formula of f, we have the following equalities:
f(2n+1-i) = f(2n+1), for i=-2n-1,…,-1
f(2n+1-i) = f(2n+1)-i, for i=1,…,2n-2-1
f(2n+1-i) = f(2n+1)-2n-2, for i=2n-2,…,2n-1
and also the following inequality:
f(k-1+i) <= f(k-1), for i=-2n-1,…,-1
f(k-1+i) <= f(k-1)+i , for i=1,…,2n-2-1
f(k-1+i) <= f(k-1)+2n-2, for i=2n-2,…,2n-1
and so we have:
f(2n+1-i)+f(k-1+i) <= f(2n+1)+f(k-1), for i=-2n-1,…,2n-1
Now, note that we have:
g(2n+k) = maxi=1..ceil(2n-1+1-k/2)(f(i) + f(2n+k+1-i) - 1)
<= f(2n+1) + f(k-1) - 1
= g(2n+k,2n+1)
And so the conjecture is proven.

So in a break with my normal tradition of not posting algorithms I don't have a proof for, I think I should mention that there's an algorithm that appears to be correct for numbers up to 50,000+ and runs in O(log n) time. This is due to Sophia Westwood, who I worked on this problem with for about three hours today. All credit for this is due to her. Empirically it seems to work beautifully, and it's much, much faster than the O(n) solutions.
One observation about the structure of this problem is that if n is sufficiently large (n ≥ 5), then if you put a 1 anywhere, the problem splits into two subproblems, one to the left of the 1 and one to the right. Although the 1s might be placed in the different halves at different times, the eventual placement is the same as if you solved each half separately and combined them back together.
The next observation is this: suppose you have an array of size 2k + 1 for some k. In that case, suppose that you put a 1 on either side of the array. Then:
The next 1 is placed on the other side of the array.
The next 1 is placed in the middle.
You now have two smaller subproblems of size 2k-1 + 1.
The important part about this is that the resulting bit pattern is an alternating series of 1s and 0s. For example:
For 5 = 4 + 1, we get 10101
For 9 = 8 + 1, we get 101010101
For 17 = 16 + 1, we get 10101010101010101
The reason this matters is the following: suppose you have n total elements in the array and let k be the largest possible value for which 2k + 1 ≤ n. If you place the 1 at position 2k + 1, then the left part of the array up to that position will end up getting tiled with alternating 1s and 0s, which puts a lot of 1s into the array.
What's not obvious is that placing the 1 bit there, for all numbers up to 50,000, appears to yield an optimal solution! I've written a Python script that checks this (using a recurrence relation similar to the one #justhalf) and it seems to work well. The reason that this fact is so useful is that it's really easy to compute this index. In particular, if 2k + 1 ≤ n, then 2k ≤ n - 1, so k ≤ lg (n - 1). Choosing the value ⌊lg (n - 1) ⌋ as your choice of k then lets you compute the bit index by computing 2k + 1. This value of k can be computed in O(log n) time and the exponentiation can be done in O(log n) time as well, so the total runtime is Θ(log n).
The only issue is that I haven't formally proven that this works. All I know is that it's right for the first 50,000 values we've tried. :-)
Hope this helps!

I'll attach what I have. Same as yours, alas, time is basically O(n**3). But at least it avoids recursion (etc), so won't blow up when you get near a million ;-) Note that this returns the best vector found, not the count; e.g.,
>>> solve(23)
[6, 0, 11, 0, 1, 0, 0, 10, 0, 5, 0, 9, 0, 3, 0, 0, 8, 0, 4, 0, 7, 0, 2]
So it also shows the order in which the 1 bits were chosen. The easiest way to get the count is to pass the result to max().
>>> max(solve(23))
11
Or change the function to return maxsofar instead of best.
If you want to run numbers on the order of a million, you'll need something radically different. You can't even afford quadratic time for that (let alone this approach's cubic time). Unlikely to get such a huge O() improvement from fancier data structures - I expect it would require deeper insight into the mathematics of the problem.
def solve(n):
maxsofar, best = 1, [1] + [0] * (n-1)
# by symmetry, no use trying starting points in last half
# (would be a mirror image).
for i in xrange((n + 1)//2):
v = [0] * n
v[i] = count = 1
# d21[i] = distance to closest 1 from index i
d21 = range(i, 0, -1) + range(n-i)
while 1:
d, j = max((d, j) for j, d in enumerate(d21))
if d >= 2:
count += 1
v[j] = count
d21[j] = 0
k = 1
while j-k >= 0 and d21[j-k] > k:
d21[j-k] = k
k += 1
k = 1
while j+k < n and d21[j+k] > k:
d21[j+k] = k
k += 1
else:
if count > maxsofar:
maxsofar = count
best = v[:]
break
return best

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.