Minimum number of steps to convert one integer to another

Minimum number of steps to convert one integer to another - python

Recently i came across a problem, given 2 integers A and B, we need to convert A to B in minimum number of steps.
We can perform following operations on A:
If A is odd, decrease by 1
If A is even, increase by 1
Multiply A(even or odd) by 2
if A is even, divide by 2
again, we have to find the minimum number of steps needed to convert A to B.
The constraints are 0 < A, B < 10^18
My approach:
I tried to solve the problem using Breadth First Search, adding all the possible reachable numbers from a step into a queue, but it fails on higher constraints i.e. time outs.
Can anyone suggest a faster alternative?
EDIT: A is not necessarily less than B

Basically you have the following operations:
flip the lowest bit
shift bits to the left or to the right
Assume, you have A == 0, how you would construct B? Right, you flip the lower bit one by one and shift the number to the left, for example if B == 5, which is 0x0101, you will need 2 flips and 2 shifts.
Now, we have to deal with the case when A != 0 -- in this case you have to turn the lower bit to 0 and shift right to clean up the mess. For example, if you have A == 32, which is 0x0100000 and you want to get 5 (0x0101), you have to do three shifts to the right, then flip the lower bit and you're done.
So, all you have to do is to:
count how many flips/r-shifts you have to do until the highest bit of A is equal to the highest bit of B.
then count how many flips/r-shifts you need to clean up the rest
count how many flips/left-shifts you need to rebuild the lower part of B.
ok, a few hours passed, here's the solution. First a useful function, that says how many ops we need to create a number:
def bit_count(num) :
# the number of non-zero bits in a number
return bin(num).count('1')
def num_ops(num) :
# number of shifts + number of flips
return num.bit_length() + bit_count(num)
Now, well, assume A > B, because otherwise we can swap them while keeping the number of the operations the same. Here's how far we have to shift A to make it start from the same bit as B:
needed_shifts = A.bit_length() - B.bit_length()
while doing that we need to flip a few bits:
mask = (1 << (needed_shifts+1)) - 1
needed_flips = bit_count(A & mask)
Now we count how many ops are required to clean A and rebuild B:
A >>= needed_shifts
clean_shifts = (A & ~B).bit_length()
clean_flips = bit_count(A & ~B)
rebuild_shifts = (B & ~A).bit_length()
rebuild.flips = bit_count(B & ~A)
Finally we sum up all together:
result_ops = needed_shifts + needed_flips + max(clean_shifts,rebuild_shifts) * 2 + clean_flips + rebuils_flips
That's all, folks! =)

The list of available operations is symmetric: 2 sets of operations, each the opposite of the other:
the last bit can be flipped
the number can be shifted left one position or right one position if the low bit is 0.
Hence it takes the same number of operations to go from A to B or from B to A.
Going from A to B takes at most the number of operations to go from A to 0 plus the number of operations to go from B to 0. These operations strictly decrease the values of A and B. If along the way an intermediary value can be reached from both A and B, there is no need to go all the way to 0.
Here is a simple function that performs the individual steps on A and B and stops as soon as this common number is found:
def num_ops(a, b):
# compute the number of ops to transform a into b
# by symmetry, the same number of ops is needed to transform b into a
count = 0
while a != b:
if a > b:
if (a & 1) != 0:
a -= 1
else:
a >>= 1
else:
if (b & 1) != 0:
b -= 1
else:
b >>= 1
count += 1
return count

This problem can be optimised using dynamic programming.
I wrote the following code taking few things into consideration:
The infinite recursion should be avoided carefully by putting up base conditions. For eg: if A=0 and B<0, then no answer exists.
If the function convert(A, B) is called for more than 1 times into recursion and answer for the state (A, B) is not previously calculated, then recursion is terminated as no answer exists for this case. For eg: (80, 100) -> (160, 100) -> (80->100) -> (160, 100) -> ........
This is done by maintaining the count of each state into a map and defining the maximum recursive calls limit (3 in the following program) for the same state of DP.
The map dp maintains the answer for each state (A, B) and the map iterationsCount maintains the count of number of times the same state (A, B) is called.
Have a look at the following implementation:
#include <utility>
#include <iterator>
#include <map>
#include <set>
#include <iostream>
#include <climits>
typedef long long int LL;
std::map<std::pair<LL, LL>, LL> dp;
std::map<std::pair<LL, LL>, int > iterationsCount;
LL IMPOSSIBLE = (LL)1e9;
LL MAX_RECURSION_LIMIT = 3;
LL convert(LL a, LL b)
{
//std::cout<<a<<" "<<b<<std::endl;
// To avoid infinite recursion:
if(iterationsCount.find(std::make_pair(a, b))!=iterationsCount.end() &&
iterationsCount[std::make_pair(a,b)] > MAX_RECURSION_LIMIT &&
dp.find(std::make_pair(a,b))==dp.end()){
return IMPOSSIBLE;
}
// Maintaining count of each state(A, B)
iterationsCount[std::make_pair(a, b)]++;
LL value1, value2, value3, value4, value5;
value1 = value2 = value3 = value4 = value5 = IMPOSSIBLE;
if(dp.find(std::make_pair(a,b)) != dp.end()){
return dp[std::make_pair(a, b)];
}
// Base Case
if(a==0 && b<0){
return IMPOSSIBLE;
}
// Base Case
if (a == b)
return 0;
//Conditions
if (a%2 == 1){
if(a < b){
value1 = 1 + convert(2*a, b);
}
else if(a > b){
value2 = 1 + convert(a-1, b);
}
}
else{
if(a < b){
value3 = 1 + convert(a*2, b);
value4 = 1 + convert(a+1, b);
}
else if(a > b){
value5 = 1 + convert(a/2, b);
}
}
LL ans = std::min(value1, std::min(value2, std::min(value3, std::min(value4, value5))));
dp[std::make_pair(a, b)] = ans;
return ans;
}
int main(){
LL ans = convert(10, 95);
if(ans == IMPOSSIBLE){
std::cout<<"Impossible";
}else{
std::cout<<ans;
}
return 0;
}

Related

Write a program to find greatest common divisor (GCD) or highest common factor (HCF) of given two numbers [duplicate]

I just found this algorithm to compute the greatest common divisor in my lecture notes:
public static int gcd( int a, int b ) {
while (b != 0) {
final int r = a % b;
a = b;
b = r;
}
return a;
}
So r is the remainder when dividing b into a (get the mod). Then b is assigned to a, and the remainder is assigned to b, and a is returned. I can't for the life of my see how this works!
And then, apparently this algorithm doesn't work for all cases, and this one must then be used:
public static int gcd( int a, int b ) {
final int gcd;
if (b != 0) {
final int q = a / b;
final int r = a % b; // a == r + q * b AND r == a - q * b.
gcd = gcd( b, r );
} else {
gcd = a;
}
return gcd;
}
I don't understand the reasoning behind this. I generally get recursion and am good at Java but this is eluding me. Help please?

The Wikipedia article contains an explanation, but it's not easy to find it immediately (also, procedure + proof don't always answer the question "why it works").
Basically it comes down to the fact that for two integers a, b (assuming a >= b), it is always possible to write a = bq + r where r < b.
If d=gcd(a,b) then we can write a=ds and b=dt. So we have ds = qdt + r. Since the left hand side is divisible by d, the right hand side must also be divisible by d. And since qdt is divisible by d, the conclusion is that r must also be divisible by d.
To summarise: we have a = bq + r where r < b and a, b and r are all divisible by gcd(a,b).
Since a >= b > r, we have two cases:
If r = 0 then a = bq, and so b divides both b and a. Hence gcd(a,b)=b.
Otherwise (r > 0), we can reduce the problem of finding gcd(a,b) to the problem of finding gcd(b,r) which is exactly the same number (as a, b and r are all divisible by d).
Why is this a reduction? Because r < b. So we are dealing with numbers that are definitely smaller. This means that we only have to apply this reduction a finite number of times before we reach r = 0.
Now, r = a % b which hopefully explains the code you have.

They're equivalent. First thing to notice is that q in the second program is not used at all. The other difference is just iteration vs. recursion.
As to why it works, the Wikipedia page linked above is good. The first illustration in particular is effective to convey intuitively the "why", and the animation below then illustrates the "how".

given that 'q' is never used, I don't see a difference between your plain iterative function, and the recursive iterative function... both do
gdc(first number, second number)
as long as (second number > 0) {
int remainder = first % second;
gcd = try(second as first, remainder as second);
}
}
Barring trying to apply this to non-integers, under which circumstances does this algorithm fail?
(also see http://en.wikipedia.org/wiki/Euclidean_algorithm for lots of detailed info)

Here is an interesting blog post: Tominology.
Where a lot of the intuition behind the Euclidean Algorithm is discussed, it is implemented in JavaScript, but I believe that if one want's there is no difficult to convert the code to Java.

Here is a very useful explanation that I found.
For those too lazy to open it, this is what it says :
Consider the example when you had to find the GCD of (3084,1424). Lets assume that d is the GCD. Which means d | 3084 and d | 1424 (using the symbol '|' to say 'divides').
It follows that d | (3084 - 1424). Now we'll try to reduce these numbers which are divisible by d (in this case 3084 and 1024) as much as possible, so that we reach 0 as one of the numbers. Remember that GCD (a, 0) is a.
Since d | (3084 - 1424), it follows that d | ( 3084 - 2(1424) )
which means d | 236.
Hint : (3084 - 2*1424 = 236)
Now forget about the initial numbers, we just need to solve for d, and we know that d is the greatest number that divides 236, 1424 and 3084. So we use the smaller two numbers to proceed because it'll converge the problem towards 0.
d | 1424 and d | 236 implies that d | (1424 - 236).
So, d | ( 1424 - 6(236) ) => d | 8.
Now we know that d is the greatest number that divides 8, 236, 1424 and 3084. Taking the smaller two again, we have
d | 236 and d | 8, which implies d | (236 - 8).
So, d | ( 236 - 29(8) ) => d | 4.
Again the list of numbers divisible by d increases and converges (the numbers are getting smaller, closer to 0). As it stands now, d is the greatest number that divides 4, 8, 236, 1424, 3084.
Taking same steps,
d | 8 and d | 4 implies d | (8-4).
So, d | ( 8 - 2(4) ) => d | 0.
The list of numbers divisible by d is now 0, 4, 8, 236, 1484, 3084.
GCD of (a, 0) is always a. So, as soon as you have 0 as one of the two numbers, the other number is the gcd of original two and all those which came in between.
This is exactly what your code is doing. You can recognize the terminal condition as GCD (a, 0) = a.
The other step is to find the remainder of the two numbers, and choose that and the smaller of the previous two as the new numbers.

Sum of bitwise OR of all possible subarrays of a given array

I want to find the sum of bitwise OR of all possible subarrays of a given array.
This is what I did till now:
from operator import ior
from functools import reduce
n = int(input())
a = list(map(int, input().split()))
total = 0
for i in range(1,n+1):
for j in range(n+1-i):
total += reduce(ior, a[j:j+i])
print(total)
But it is quite slow. How can I optimise it?

Since this question is from competition, I haven't answered till now.
Code:
#include <bits/stdc++.h>
using namespace std;
#define size 32
#define INT_SIZE 32
typedef long long int Int;
typedef unsigned long long int Unt;
// Driver code
int main()
{
Int n;
cin>>n;
Int arr[n];
for(int i=0;i<n;i++)
cin>>arr[i];
int zeros[size];
for(int i=0;i<size;i++)
zeros[i]=1;
unsigned long long int sum=0;
for(int i=0;i<n;i++)
{
for(int j=0;j<size;j++)
{
if(!(arr[i] & 1))
zeros[j]++;
else
{
sum+=(Unt)((Unt)zeros[j]*(Unt)(1<<j)*(Unt)(n-i));
zeros[j]=1;
}
arr[i]>>=1;
}
}
cout<<sum;
return 0;
}
Logic:
Note*: This is my thinking process, this may not be understandable easily. Apology if I can't able to make you understand.
Take example :
5 (size of array)
1 2 3 4 5 (array)
for,
1 = 1.0
1,2 = 1.0 & 2.1
1,2,3 = 1.0 & 2.1 [3.0 & 3.1 won't be useful because they're already taken by 1 & 2]
1,2,3,4 = 1.0 & 2.1 & 4.2
1,2,3,4,5 = 1.0 & 2.1 & 4.2 are useful.
In above explanation, X.Y means Yth bit in number X is taken for OR operation.
For,
2 = 2.1
2,3 = 2.1 & 3.0 [Since 1.0 won't be available]
{continues..}
So, if you carefully observe although 3.0 is available, it's not being used while 1 is present.
If a bit needed to be used, same bit of previous numbers should be 0. [remember this, we'll use it later]
We'll create 1 array, named zeros, which gives count of last set bit of previous numbers at each position respectively [this sentence may give you confusion, try to read more, you may get clarity].
For given array,
At 1: 0 0 0
At 2: 1 1 0 {binary of 1 is 001}
At 3: 2 0 1 {binary of 2 is 010}
At 4: 3 0 0 {binary of 3 is 011}
At 5: 0 1 1 {binary of 4 is 100}
End: 0 2 0 {binary of 5 is 101}
What we did above is, if bit is set bit, we'll make it 0, else we add count so that we can understand how many numbers doesn't have set bit position wise respectively, means, before 3, 2 numbers doesn't have set bit at postion 2^2, 1 number doesn't have set bit at 2^0.
Now, we just need to multiply depending on their set bits.
If it is set bit, then we'll add (zeros[i]+1)(2^i)(n-i).

Let's first find the sum of bitwise OR of subarrays ending at position i. Let the OR of all the array elements from 1 to i is or and the ith element be a[i], the bits which are not set in a[i] but set in or are coming from some previous elements,
let's take an example here,
1 2 2
at position 3, or = 3, a[i] = 2, or^a[i] = 1
this means 1 is coming from some previous element if we remove 1 OR of some subarray ending at i will be reduced. last position where bit 0 is on is 1.
So the ans is,
ans = or*i
for all bits from 0 to m,
ans -= (i - lastposition[bit])*(1 << bit); //lastposition[] gives the last index at which bit is on.
Why last position? Beacuse the indexes before lastposition[], where this bit is on, will have no impact as the OR be reamin same due to the presence of this bit at lastposition[].
Final answer can be found out by summing up all the answers of 1 <= i <= n .
#include<bits/stdc++.h>
#define ll long long
#define pb push_back
#define rep(i,a,b) for(int i = a; i <= b; i++)
using namespace std;
ll r, a, sum, pos[30];
int main()
{
int n;
cin >> n;
rep(i,1,n)
{
cin >> a;
r |= a;
ll ex = r^a;
ll ans = i*r;
rep(bit,0,30)
if(ex & (1 << bit))
ans -= ((ll)(i - pos[bit])* ((ll)1 << bit));
sum += ans;
rep(bit,0,30)
if(a & (1 << bit))
pos[bit] = i;
}
cout << sum << '\n';
}

Sum of Two Integers without using "+" operator in python

Need some help understanding python solutions of leetcode 371. "Sum of Two Integers". I found https://discuss.leetcode.com/topic/49900/python-solution/2 is the most voted python solution, but I am having problem understand it.
How to understand the usage of "% MASK" and why "MASK = 0x100000000"?
How to understand "~((a % MIN_INT) ^ MAX_INT)"?
When sum beyond MAX_INT, the functions yells negative value (for example getSum(2147483647,2) = -2147483647), isn't that incorrect?
class Solution(object):
def getSum(self, a, b):
"""
:type a: int
:type b: int
:rtype: int
"""
MAX_INT = 0x7FFFFFFF
MIN_INT = 0x80000000
MASK = 0x100000000
while b:
a, b = (a ^ b) % MASK, ((a & b) << 1) % MASK
return a if a <= MAX_INT else ~((a % MIN_INT) ^ MAX_INT)

Let's disregard the MASK, MAX_INT and MIN_INT for a second.
Why does this black magic bitwise stuff work?
The reason why the calculation works is because (a ^ b) is "summing" the bits of a and b. Recall that bitwise xor is 1 when the bits differ, and 0 when the bits are the same. For example (where D is decimal and B is binary), 20D == 10100B, and 9D = 1001B:
10100
1001
-----
11101
and 11101B == 29D.
But, if you have a case with a carry, it doesn't work so well. For example, consider adding (bitwise xor) 20D and 20D.
10100
10100
-----
00000
Oops. 20 + 20 certainly doesn't equal 0. Enter the (a & b) << 1 term. This term represents the "carry" for each position. On the next iteration of the while loop, we add in the carry from the previous loop. So, if we go with the example we had before, we get:
# First iteration (a is 20, b is 20)
10100 ^ 10100 == 00000 # makes a 0
(10100 & 10100) << 1 == 101000 # makes b 40
# Second iteration:
000000 ^ 101000 == 101000 # Makes a 40
(000000 & 101000) << 1 == 0000000 # Makes b 0
Now b is 0, we are done, so return a. This algorithm works in general, not just for the specific cases I've outlined. Proof of correctness is left to the reader as an exercise ;)
What do the masks do?
All the masks are doing is ensuring that the value is an integer, because your code even has comments stating that a, b, and the return type are of type int. Thus, since the maximum possible int (32 bits) is 2147483647. So, if you add 2 to this value, like you did in your example, the int overflows and you get a negative value. You have to force this in Python, because it doesn't respect this int boundary that other strongly typed languages like Java and C++ have defined. Consider the following:
def get_sum(a, b):
while b:
a, b = (a ^ b), (a & b) << 1
return a
This is the version of getSum without the masks.
print get_sum(2147483647, 2)
outputs
2147483649
while
print Solution().getSum(2147483647, 2)
outputs
-2147483647
due to the overflow.
The moral of the story is the implementation is correct if you define the int type to only represent 32 bits.

Here is solution works in every case
cases
- -
- +
+ -
+ +
solution
python default int size is not 32bit, it is very large number, so to prevent overflow and stop running into infinite loop, we use 32bit mask to limit int size to 32bit (0xffffffff)
a,b=-1,-1
mask=0xffffffff
while (b & mask):
carry=a & b
a=a^b
b=carray <<1
print( (a&Mask) if b>0 else a)

For me, Matt's solution stuck in inifinite loop with inputs Solution().getSum(-1, 1)
So here is another (much slower) approach based on math:
import math
def getSum(a: int, b: int) -> int:
return int(math.log2(2**a * 2**b))

testing whether or not a list modification is periodic

I have a list of integers which is being continuously modified in a loop and I need to tell if its content repeats itself after a certain amount of iterations to break the loop.
If it doesn't, the list will eventually modify to [] or the loop terminates when a certain limit of iterations is reached. My solution so far:
def modify(numlist, a, b):
numlist = [(x * a) for x in numlist]
for i in range((len(numlist) - 1), 0, -1):
if numlist[i] >= b: numlist[i - 1] += numlist[i] // b
numlist[i] = numlist[i] % b
numlist[0] %= b
while numlist[-1] == 0:
numlist.pop(-1)
if numlist == []: break
numlist = [1, 2, 3, 4, 5]
listHistory = [numlist]
a, b = someValue, anotherValue
n = 0
while numlist != [] and n <= limit:
modify(numlist, int(a), int(b))
listHistory.append(numlist)
if numlist in listHistory: break
n += 1
limit can be very large (ca. 10**6 - 10**7) and checking the current numlist against all its previous versions becomes really slow.
Is there a more efficient way to do this or even a method to predetermine if the modification is periodic by the lists initial content and given a, b?

OK, got something.
If you look at the last element in your list, lets call it m. What happens to it it gets multiplied by a and then taken modulo b. It never gets mixed with any other element, so if a configuration of the list has to repeat itself the following must hold:
m*a^n=m modulo b
<==>a^n=1 modulo b
< >a^(n+1)=a modulo b
This is a problem where you can make use of Fermats little theorem
If a and b are coprimes, then
a^phi(b)=1 modulo b
where phi is Eulers totient function.
So this reduces the amount of list configurations which you have to store in your history drastically. You only have to store it every phi(b) steps.
I found an implementation of phi here:
Computing Eulers Totient Function
UPDATE:
Ok, I found a quick solution if you were to do += list[i] % b instead of += list[i] // b. Otherwise you need b^4*phi(b) steps in the worst case
UPDATE2:
I rewrote the code in C (see below) to make it faster and implemented the "tortoise and the hare" algorithm proposed by #user2357112. This way i can check some million loops per second what should be way faster than the python implementation.
I tried it for some different value combinations:
a b steps b^4*phi(b) (b/GCD(b,a))^4*phi(b/GCD(n,a)) (b/GCD(b,a))^4*phi(b/GCD(n,a))/steps
2 37 67469796 67469796 67469796 1
3 37 33734898 67469796 67469796 2
4 37 33734898 67469796 67469796 2
5 37 67469796 67469796 67469796 1
6 37 7496644 67469796 67469796 9
7 37 16867449 67469796 67469796 4
36 37 3748322 67469796 67469796 18
2 36 39366 20155392 629856 16
3 36 256 20155392 82944 27648
4 36 19683 20155392 39366 2
5 36 5038848 20155392 20155392 4
So you see where this is going: The cycle length seems always to be a divisor of (b/GCD(b,a))^4*phi(b/GCD(n,a)), so the worst case is (b/GCD(b,a))^4*phi(b/GCD(n,a)) steps as suspected
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void modify(int *, int, int);
void printl(int * );
int main(int argc, const char*argv[])
{
int l[5]={1,2,3,4,5};
int lz[5]={1,2,3,4,5};
int i=1,a,b,n;
if (argc<4) {
printf("Not enough arguments!!\n");
exit(1);
}
a=atoi(argv[1]);
b=atoi(argv[2]);
n=atoi(argv[3]);
modify(l,a,b);
while (i<n) {
modify(l,a,b);
modify(l,a,b);
modify(lz,a,b);
i++;
if (memcmp(l,lz,sizeof(l))==0) {
printf("success!\n");
break;
}
if (i%1000000==0) printf("Step %d.000.000\n",i/1000000);
}
printf("Final step: %d\n",i);
printl(l);
printl(lz);
return 0;
}
void modify(int * li, int a, int b) {
int i=0;
while (i<=4) {
li[i]*=a;
i++;
}
i=4;
while (i>=1) {
if (li[i]>=b) {
li[i-1]+=li[i]/b;
}
li[i]=li[i]%b;
i--;
}
li[0]=li[0]%b;
}
void printl(int * li) {
printf("l=(%d,%d,%d,%d,%d)\n",li[0],li[1],li[2],li[3],li[4]);

Your list (which you really ought to rename, by the way) stores a number mod b**something in base b. Every run of modify multiplies the number by a, then truncates zeros at the end of the representation.
Call the number originally represented by the list n, and call the original length of the list l. If this process terminates, it will do so at the first iteration k such that b**l divides n * a**k, which will only ever occur if and only if all prime factors of b**l / gcd(n, b**l) are factors of a. This is simple to determine:
def all_prime_factors_of_first_divide_second(a, b):
while a != 1:
factor = gcd(a, b)
if factor == 1:
return False
while not a % factor:
a //= factor
return True

First please allow me to say, all lists are periodic, if you consider a large enough period.
That said, this might be a good place to use a bloom filter, EG:
https://pypi.python.org/pypi/drs-bloom-filter/
Bloom filters are sets that can perform set membership tests pretty quickly, and can add things to the set without actually storing the data of the element. This means it's probabilistic, but you can adjust the probability. So you might use a bloom filter test for a quick check, and upon detecting a match using the bloom filter, confirm the result using your slow+deterministic algorithm.
Usage looks like:
In [1]: import bloom_filter_mod
In [2]: bloom_filter = bloom_filter_mod.Bloom_filter(1000000, 0.01)
In [3]: for i in range(10):
...: bloom_filter.add(i)
...:
In [4]: for i in range(0, 20, 2):
...: if i in bloom_filter:
...: print('{} present'.format(i))
...:
0 present
2 present
4 present
6 present
8 present
The 1000000 is the maximum number of elements you want to store in the filter, and the 0.01 is the maximum probability of a false positive when full.
So you could "store" each subsequence in the filter, and quickly detect recurrences.

Fast way to place bits for puzzle

There is a puzzle which I am writing code to solve that goes as follows.
Consider a binary vector of length n that is initially all zeros. You choose a bit of the vector and set it to 1. Now a process starts that sets the bit that is the greatest distance from any 1 bit to $1$ (or an arbitrary choice of furthest bit if there is more than one). This happens repeatedly with the rule that no two 1 bits can be next to each other. It terminates when there is no more space to place a 1 bit. The goal is to place the initial 1 bit so that as many bits as possible are set to 1 on termination.
Say n = 2. Then wherever we set the bit we end up with exactly one bit set.
For n = 3, if we set the first bit we get 101 in the end. But if we set the middle bit we get 010 which is not optimal.
For n = 4, whichever bit we set we end up with two set.
For n = 5, setting the first gives us 10101 with three bits set in the end.
For n = 7, we need to set the third bit to get 1010101 it seems.
I have written code to find the optimal value but it does not scale well to large n. My code starts to get slow around n = 1000 but I would like to solve the problem for n around 1 million.
#!/usr/bin/python
from __future__ import division
from math import *
def findloc(v):
count = 0
maxcount = 0
id = -1
for i in xrange(n):
if (v[i] == 0):
count += 1
if (v[i] == 1):
if (count > maxcount):
maxcount = count
id = i
count = 0
#Deal with vector ending in 0s
if (2*count >= maxcount and count >= v.index(1) and count >1):
return n-1
#Deal with vector starting in 0s
if (2*v.index(1) >= maxcount and v.index(1) > 1):
return 0
if (maxcount <=2):
return -1
return id-int(ceil(maxcount/2))
def addbits(v):
id = findloc(v)
if (id == -1):
return v
v[id] = 1
return addbits(v)
#Set vector length
n=21
max = 0
for i in xrange(n):
v = [0]*n
v[i] = 1
v = addbits(v)
score = sum([1 for j in xrange(n) if v[j] ==1])
# print i, sum([1 for j in xrange(n) if v[j] ==1]), v
if (score > max):
max = score
print max

Latest answer (O(log n) complexity)
If we believe the conjecture by templatetypedef and Aleksi Torhamo (update: proof at the end of this post), there is a closed form solution count(n) calculable in O(log n) (or O(1) if we assume logarithm and bit shifting is O(1)):
Python:
from math import log
def count(n): # The count, using position k conjectured by templatetypedef
k = p(n-1)+1
count_left = k/2
count_right = f(n-k+1)
return count_left + count_right
def f(n): # The f function calculated using Aleksi Torhamo conjecture
return max(p(n-1)/2 + 1, n-p(n-1))
def p(n): # The largest power of 2 not exceeding n
return 1 << int(log(n,2)) if n > 0 else 0
C++:
int log(int n){ // Integer logarithm, by counting the number of leading 0
return 31-__builtin_clz(n);
}
int p(int n){ // The largest power of 2 not exceeding n
if(n==0) return 0;
return 1<<log(n);
}
int f(int n){ // The f function calculated using Aleksi Torhamo conjecture
int val0 = p(n-1);
int val1 = val0/2+1;
int val2 = n-val0;
return val1>val2 ? val1 : val2;
}
int count(int n){ // The count, using position k conjectured by templatetypedef
int k = p(n-1)+1;
int count_left = k/2;
int count_right = f(n-k+1);
return count_left + count_right;
}
This code can calculate the result for n=100,000,000 (and even n=1e24 in Python!) correctly in no time1.
I have tested the codes with various values for n (using my O(n) solution as the standard, see Old Answer section below), and they still seem correct.
This code relies on the two conjectures by templatetypedef and Aleksi Torhamo2. Anyone wants to proof those? =D (Update 2: PROVEN)
1By no time, I meant almost instantly
2The conjecture by Aleksi Torhamo on f function has been empirically proven for n<=100,000,000
Old answer (O(n) complexity)
I can return the count of n=1,000,000 (the result is 475712) in 1.358s (in my iMac) using Python 2.7. Update: It's 0.198s for n=10,000,000 in C++. =)
Here is my idea, which achieves O(n) time complexity.
The Algorithm
Definition of f(n)
Define f(n) as the number of bits that will be set on bitvector of length n, assuming that the first and last bit are set (except for n=2, where only the first or last bit is set). So we know some values of f(n) as follows:
f(1) = 1
f(2) = 1
f(3) = 2
f(4) = 2
f(5) = 3
Note that this is different from the value that we are looking for, since the initial bit might not be at the first or last, as calculated by f(n). For example, we have f(7)=3 instead of 4.
Note that this can be calculated rather efficiently (amortized O(n) to calculate all values of f up to n) using the recurrence relation:
f(2n) = f(n)+f(n+1)-1
f(2n+1) = 2*f(n+1)-1
for n>=5, since the next bit set following the rule will be the middle bit, except for n=1,2,3,4. Then we can split the bitvector into two parts, each independent of each other, and so we can calculate the number of bits set using f( floor(n/2) ) + f( ceil(n/2) ) - 1, as illustrated below:
n=11 n=13
10000100001 1000001000001
<----> <----->
f(6)<----> f(7) <----->
f(6) f(7)
n=12 n=14
100001000001 10000010000001
<----> <----->
f(6)<-----> f(7) <------>
f(7) f(8)
we have the -1 in the formula to exclude the double count of the middle bit.
Now we are ready to count the solution of original problem.
Definition of g(n,i)
Define g(n,i) as the number of bits that will be set on bitvector of length n, following the rules in the problem, where the initial bit is at the i-th bit (1-based). Note that by symmetry the initial bit can be anywhere from the first bit up to the ceil(n/2)-th bit. And for those cases, note that the first bit will be set before any bit in between the first and the initial, and so is the case for the last bit. Therefore the number of bit set in the first partition and the second partition is f(i) and f(n+1-i) respectively.
So the value of g(n,i) can be calculated as:
g(n,i) = f(i) + f(n+1-i) - 1
following the idea when calculating f(n).
Now, to calculate the final result is trivial.
Definition of g(n)
Define g(n) as the count being looked for in the original problem. We can then take the maximum of all possible i, the position of initial bit:
g(n) = maxi=1..ceil(n/2)(f(i) + f(n+1-i) - 1)
Python code:
import time
mem_f = [0,1,1,2,2]
mem_f.extend([-1]*(10**7)) # This will take around 40MB of memory
def f(n):
global mem_f
if mem_f[n]>-1:
return mem_f[n]
if n%2==1:
mem_f[n] = 2*f((n+1)/2)-1
return mem_f[n]
else:
half = n/2
mem_f[n] = f(half)+f(half+1)-1
return mem_f[n]
def g(n):
return max(f(i)+f(n+1-i)-1 for i in range(1,(n+1)/2 + 1))
def main():
while True:
n = input('Enter n (1 <= n <= 10,000,000; 0 to stop): ')
if n==0: break
start_time = time.time()
print 'g(%d) = %d, in %.3fs' % (n, g(n), time.time()-start_time)
if __name__=='__main__':
main()
Complexity Analysis
Now, the interesting thing is, what is the complexity of calculating g(n) with the method described above?
We should first note that we iterate over n/2 values of i, the position of initial bit. And in each iteration we call f(i) and f(n+1-i). Naive analysis will lead to O(n * O(f(n))), but actually we used memoization on f, so it's much faster than that, since each value of f(i) is calculated only once, at most. So the complexity is actually added by the time required to calculate all values of f(n), which would be O(n + f(n)) instead.
So what's the complexity of initializing f(n)?
We can assume that we precompute every value of f(n) first before calculating g(n). Note that due to the recurrence relation and the memoization, generating the whole values of f(n) takes O(n) time. And the next call to f(n) will take O(1) time.
So, the overall complexity is O(n+n) = O(n), as evidenced by this running time in my iMac for n=1,000,000 and n=10,000,000:
> python max_vec_bit.py
Enter n (1 <= n <= 10,000,000; 0 to stop): 1000000
g(1000000) = 475712, in 1.358s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
>
> <restarted the program to remove the effect of memoization>
>
> python max_vec_bit.py
Enter n (1 <= n <= 10,000,000; 0 to stop): 10000000
g(10000000) = 4757120, in 13.484s
Enter n (1 <= n <= 10,000,000; 0 to stop): 6745231
g(6745231) = 3145729, in 3.072s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
And as a by-product of memoization, the calculation of lesser value of n will be much faster after the first call to large n, as you can also see in the sample run. And with language better suited for number crunching such as C++, you might get significantly faster running time
I hope this helps. =)
The code using C++, for performance improvement
The result in C++ is about 68x faster (measured by clock()):
> ./a.out
Enter n (1 <= n <= 10,000,000; 0 to stop): 1000000
g(1000000) = 475712, in 0.020s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
>
> <restarted the program to remove the effect of memoization>
>
> ./a.out
Enter n (1 <= n <= 10,000,000; 0 to stop): 10000000
g(10000000) = 4757120, in 0.198s
Enter n (1 <= n <= 10,000,000; 0 to stop): 6745231
g(6745231) = 3145729, in 0.047s
Enter n (1 <= n <= 10,000,000; 0 to stop): 0
Code in C++:
#include <cstdio>
#include <cstring>
#include <ctime>
int mem_f[10000001];
int f(int n){
if(mem_f[n]>-1)
return mem_f[n];
if(n%2==1){
mem_f[n] = 2*f((n+1)/2)-1;
return mem_f[n];
} else {
int half = n/2;
mem_f[n] = f(half)+f(half+1)-1;
return mem_f[n];
}
}
int g(int n){
int result = 0;
for(int i=1; i<=(n+1)/2; i++){
int cnt = f(i)+f(n+1-i)-1;
result = (cnt > result ? cnt : result);
}
return result;
}
int main(){
memset(mem_f,-1,sizeof(mem_f));
mem_f[0] = 0;
mem_f[1] = mem_f[2] = 1;
mem_f[3] = mem_f[4] = 2;
clock_t start, end;
while(true){
int n;
printf("Enter n (1 <= n <= 10,000,000; 0 to stop): ");
scanf("%d",&n);
if(n==0) break;
start = clock();
int result = g(n);
end = clock();
printf("g(%d) = %d, in %.3fs\n",n,result,((double)(end-start))/CLOCKS_PER_SEC);
}
}
Proof
note that for the sake of keeping this answer (which is already very long) simple, I've skipped some steps in the proof
Conjecture of Aleksi Torhamo on the value of f
For `n>=1`, prove that:
f(2n+k) = 2n-1+1 for k=1,2,…,2n-1 ...(1)
f(2n+k) = k for k=2n-1+1,…,2n ...(2)
given f(0)=f(1)=f(2)=1
The result above can be easily proven using induction on the recurrence relation, by considering the four cases:
Case 1: (1) for even k
Case 2: (1) for odd k
Case 3: (2) for even k
Case 4: (2) for odd k
Suppose we have the four cases proven for n. Now consider n+1.
Case 1:
f(2n+1+2i) = f(2n+i) + f(2n+i+1) - 1, for i=1,…,2n-1
= 2n-1+1 + 2n-1+1 - 1
= 2n+1
Case 2:
f(2n+1+2i+1) = 2*f(2n+i+1) - 1, for i=0,…,2n-1-1
= 2*(2n-1+1) - 1
= 2n+1
Case 3:
f(2n+1+2i) = f(2n+i) + f(2n+i+1) - 1, for i=2n-1+1,…,2n
= i + (i+1) - 1
= 2i
Case 4:
f(2n+1+2i+1) = 2*f(2n+i+1) - 1, for i=2n-1+1,…,2n-1
= 2*(i+1) - 1
= 2i+1
So by induction the conjecture is proven.
Conjecture of templatetypedef on the best position
For n>=1 and k=1,…,2n, prove that g(2n+k) = g(2n+k, 2n+1)
That is, prove that placing the first bit on the 2n+1-th position gives maximum number of bits set.
The proof:
First, we have
g(2n+k,2n+1) = f(2n+1) + f(k-1) - 1
Next, by the formula of f, we have the following equalities:
f(2n+1-i) = f(2n+1), for i=-2n-1,…,-1
f(2n+1-i) = f(2n+1)-i, for i=1,…,2n-2-1
f(2n+1-i) = f(2n+1)-2n-2, for i=2n-2,…,2n-1
and also the following inequality:
f(k-1+i) <= f(k-1), for i=-2n-1,…,-1
f(k-1+i) <= f(k-1)+i , for i=1,…,2n-2-1
f(k-1+i) <= f(k-1)+2n-2, for i=2n-2,…,2n-1
and so we have:
f(2n+1-i)+f(k-1+i) <= f(2n+1)+f(k-1), for i=-2n-1,…,2n-1
Now, note that we have:
g(2n+k) = maxi=1..ceil(2n-1+1-k/2)(f(i) + f(2n+k+1-i) - 1)
<= f(2n+1) + f(k-1) - 1
= g(2n+k,2n+1)
And so the conjecture is proven.

So in a break with my normal tradition of not posting algorithms I don't have a proof for, I think I should mention that there's an algorithm that appears to be correct for numbers up to 50,000+ and runs in O(log n) time. This is due to Sophia Westwood, who I worked on this problem with for about three hours today. All credit for this is due to her. Empirically it seems to work beautifully, and it's much, much faster than the O(n) solutions.
One observation about the structure of this problem is that if n is sufficiently large (n ≥ 5), then if you put a 1 anywhere, the problem splits into two subproblems, one to the left of the 1 and one to the right. Although the 1s might be placed in the different halves at different times, the eventual placement is the same as if you solved each half separately and combined them back together.
The next observation is this: suppose you have an array of size 2k + 1 for some k. In that case, suppose that you put a 1 on either side of the array. Then:
The next 1 is placed on the other side of the array.
The next 1 is placed in the middle.
You now have two smaller subproblems of size 2k-1 + 1.
The important part about this is that the resulting bit pattern is an alternating series of 1s and 0s. For example:
For 5 = 4 + 1, we get 10101
For 9 = 8 + 1, we get 101010101
For 17 = 16 + 1, we get 10101010101010101
The reason this matters is the following: suppose you have n total elements in the array and let k be the largest possible value for which 2k + 1 ≤ n. If you place the 1 at position 2k + 1, then the left part of the array up to that position will end up getting tiled with alternating 1s and 0s, which puts a lot of 1s into the array.
What's not obvious is that placing the 1 bit there, for all numbers up to 50,000, appears to yield an optimal solution! I've written a Python script that checks this (using a recurrence relation similar to the one #justhalf) and it seems to work well. The reason that this fact is so useful is that it's really easy to compute this index. In particular, if 2k + 1 ≤ n, then 2k ≤ n - 1, so k ≤ lg (n - 1). Choosing the value ⌊lg (n - 1) ⌋ as your choice of k then lets you compute the bit index by computing 2k + 1. This value of k can be computed in O(log n) time and the exponentiation can be done in O(log n) time as well, so the total runtime is Θ(log n).
The only issue is that I haven't formally proven that this works. All I know is that it's right for the first 50,000 values we've tried. :-)
Hope this helps!

I'll attach what I have. Same as yours, alas, time is basically O(n**3). But at least it avoids recursion (etc), so won't blow up when you get near a million ;-) Note that this returns the best vector found, not the count; e.g.,
>>> solve(23)
[6, 0, 11, 0, 1, 0, 0, 10, 0, 5, 0, 9, 0, 3, 0, 0, 8, 0, 4, 0, 7, 0, 2]
So it also shows the order in which the 1 bits were chosen. The easiest way to get the count is to pass the result to max().
>>> max(solve(23))
11
Or change the function to return maxsofar instead of best.
If you want to run numbers on the order of a million, you'll need something radically different. You can't even afford quadratic time for that (let alone this approach's cubic time). Unlikely to get such a huge O() improvement from fancier data structures - I expect it would require deeper insight into the mathematics of the problem.
def solve(n):
maxsofar, best = 1, [1] + [0] * (n-1)
# by symmetry, no use trying starting points in last half
# (would be a mirror image).
for i in xrange((n + 1)//2):
v = [0] * n
v[i] = count = 1
# d21[i] = distance to closest 1 from index i
d21 = range(i, 0, -1) + range(n-i)
while 1:
d, j = max((d, j) for j, d in enumerate(d21))
if d >= 2:
count += 1
v[j] = count
d21[j] = 0
k = 1
while j-k >= 0 and d21[j-k] > k:
d21[j-k] = k
k += 1
k = 1
while j+k < n and d21[j+k] > k:
d21[j+k] = k
k += 1
else:
if count > maxsofar:
maxsofar = count
best = v[:]
break
return best

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.