Evenly distribute within a list (Google Foobar: Maximum Equality)

Evenly distribute within a list (Google Foobar: Maximum Equality) - python

This question comes from Google Foobar, and my code passes all but the last test, with the input/output hidden.
The prompt
In other words, choose two elements of the array, x[i] and x[j]
(i distinct from j) and simultaneously increment x[i] by 1 and decrement
x[j] by 1. Your goal is to get as many elements of the array to have
equal value as you can.
For example, if the array was [1,4,1] you could perform the operations
as follows:
Send a rabbit from the 1st car to the 0th: increment x[0], decrement
x[1], resulting in [2,3,1] Send a rabbit from the 1st car to the 2nd:
increment x[2], decrement x[1], resulting in [2,2,2].
All the elements are of the array are equal now, and you've got a
strategy to report back to Beta Rabbit!
Note that if the array was [1,2], the maximum possible number of equal
elements we could get is 1, as the cars could never have the same
number of rabbits in them.
Write a function answer(x), which takes the array of integers x and
returns the maximum number of equal array elements that we can get, by
doing the above described command as many times as needed.
The number of cars in the train (elements in x) will be at least 2,
and no more than 100. The number of rabbits that want to share a car
(each element of x) will be an integer in the range [0, 1000000].
My code
from collections import Counter
def most_common(lst):
data = Counter(lst)
return data.most_common(1)[0][1]
def answer(x):
"""The goal is to take all of the rabbits in list x and distribute
them equally across the original list elements."""
total = sum(x)
length = len(x)
# Find out how many are left over when distributing niavely.
div, mod = divmod(total, length)
# Because of the variable size of the list, the remainder
# might be greater than the length of the list.
# I just realized this is unnecessary.
while mod > length:
div += length
mod -= length
# Create a new list the size of x with the base number of rabbits.
result = [div] * length
# Distribute the leftovers from earlier across the list.
for i in xrange(mod):
result[i] += 1
# Return the most common element.
return most_common(result)
It runs well under my own testing purposes, handling one million tries in ten or so seconds. But it fails under an unknown input.
Have I missed something obvious, or did I make an assumption I shouldn't have?

Sorry, but your code doesn't work in my testing. I fed it [0, 0, 0, 0, 22] and got back a list of [5, 5, 4, 4, 4] for an answer of 3; the maximum would be 4 identical cars, with the original input being one such example. [4, 4, 4, 4, 6] would be another. I suspect that's your problem, and that there are quite a few other such examples in the data base.
For N cars, the maximum would be either N (if the rabbit population is divisible by the number of cars) or N-1. This seems so simple that I fear I'm missing a restriction in the problem. It didn't ask for a balanced population, just as many car populations as possible should be equal. In short:
def answer(census):
size = len(census)
return size if sum(census) % size == 0 else (size-1)

Related

foobar please-pass-the-coded-messages hidden test case not passing

I have been attempting google foobar and in the second level i got the task named please-pass-the-coded-messages. below is the task
==============================
You need to pass a message to the bunny workers, but to avoid detection, the code you agreed to use is... obscure, to say the least. The bunnies are given food on standard-issue plates that are stamped with the numbers 0-9 for easier sorting, and you need to combine sets of plates to create the numbers in the code. The signal that a number is part of the code is that it is divisible by 3. You can do smaller numbers like 15 and 45 easily, but bigger numbers like 144 and 414 are a little trickier. Write a program to help yourself quickly create large numbers for use in the code, given a limited number of plates to work with.
You have L, a list containing some digits (0 to 9). Write a function solution(L) which finds the largest number that can be made from some or all of these digits and is divisible by 3. If it is not possible to make such a number, return 0 as the solution. L will contain anywhere from 1 to 9 digits. The same digit may appear multiple times in the list, but each element in the list may only be used once.
Languages
=========
To provide a Java solution, edit Solution.java
To provide a Python solution, edit solution.py
Test cases
==========
Your code should pass the following test cases.
Note that it may also be run against hidden test cases not shown here.
-- Java cases --
Input:
Solution.solution({3, 1, 4, 1})
Output:
4311
Input:
Solution.solution({3, 1, 4, 1, 5, 9})
Output:
94311
-- Python cases --
Input:
solution.solution([3, 1, 4, 1])
Output:
4311
Input:
solution.solution([3, 1, 4, 1, 5, 9])
Output:
94311
Use verify [file] to test your solution and see how it does. When you are finished editing your code, use submit [file] to submit your answer. If your solution passes the test cases, it will be removed from your home folder.
i have tried a solution which is working very correct in my ide(note i wanted a solution without any library)
def solution(l):
# Your code here
if (len(l) == 1 and l[0] % 3 != 0) or (len(l) == 0):
return 0
number = formGreatestNumber(l)
remainder = number % 3
if remainder == 0:
result = formGreatestNumber(l)
return result
result = removeUnwanted(l, remainder)
return result
def formGreatestNumber(li):
li.sort(reverse=True) # descending order
li = [str(d) for d in li] # each digit in string
number = 0
if len(li) > 0:
number = int("".join(li)) # result
return number
def removeUnwanted(l, remainder):
possibleRemovals = [i for i in l if i % 3 == remainder]
if len(possibleRemovals) > 0:
l.remove(min(possibleRemovals))
result = formGreatestNumber(l)
return result
pairs = checkForTwo(l, remainder)
if len(pairs) > 0:
for ind in pairs:
l.remove(ind)
result = formGreatestNumber(l)
return result
else:
divisibleDigits = [d for d in l if d % 3 == 0]
if len(divisibleDigits) > 0:
result = formGreatestNumber(divisibleDigits)
return result
else:
return 0
def checkForTwo(l, remainder): # check of (sum of any two pairs - remainder) is divisible by 3
result = []
for i in range(len(l)):
for j in range(i+1, len(l)):
if ((l[i]+l[j])-remainder) % 3 == 0:
result.append(l[i])
result.append(l[j])
return result
return []
print(solution([]))
print(solution([1]))
print(solution([9]))
print(solution([3, 1, 4, 1, 9, 2, 5, 7]))
however it is on verifying showing-
Verifying solution...
Test 1 passed!
Test 2 passed!
Test 3 failed [Hidden]
Test 4 passed! [Hidden]
Test 5 passed! [Hidden]
so where is the error i am not noticing and is there any other way without any library like itertools?

I won't give away the code and spoil the fun for you, I'll perhaps try to explain the intuition.
About your code, I think the (2nd part of) the function removeUnwanted() is problematic here.
Let's see.
So first off, you'd arrange the input digits into a single number, in order from largest to smallest, which you've already done.
Then if the number formed isn't divisible by 3, try removing the smallest digit.
If that doesn't work, reinsert the smallest digit and remove the 2nd smallest digit, and so on.
Once you're done with removing all possible digits one at a time, try removing digits two at a time, starting with the two smallest.
If any of these result in a number that is divisible by 3, you're done.
Observe that you'll never need to remove more than 2 digits for this problem. The only way it's impossible to form the required number is if there are 2 or lesser digits and they are both either in the set {1,4,7} or {2,5,8}.
Edit: More about your code -
The initial part of your removeUnwanted() looks okay where you check if there's a single digit in the number which can be removed, removing the minimum from the choice of single digits and getting the answer.
I reckon the problem lies in your function checkForTwo(), which you call subsequently in removeUnwanted.
When you're passing the list to checkForTwo(), observe that the list is actually sorted in the decreasing order. This is because li.sort(reverse=True) in your function formGreatestNumber() sorted the list in place, which means the content of list l was sorted in descending order too.
And then in checkForTwo(), you try to find a pair that satisfies the required condition, but you're looping from the biggest 2 pairs that can possibly be removed. i starts from 0 and j starts from i+1 which is 1, and since your list is in descending order, you're trying to remove the biggest 2 elements possible.
A quick fix would be to sort the list in ascending order and then proceed further iterate through the list in reverse order, because since the list is sorted in descending order already, reverse iteration gives you the list in ascending order and saves us from re-sorting which would normally cost an additional O(NlogN) time.

Fast way to check consecutive subsequences for total

I have a list (up to 10,000 long) of numbers 0, 1, or 2.
I need to see how many consecutive subsequences have a total which is NOT 1. My current method is to for each list do:
cons = 0
for i in range(seqlen+1):
for j in range(i+1, seqlen+1):
if sum(twos[i:j]) != 1:
cons += 1
So an example input would be:
[0, 1, 2, 0]
and the output would be
cons = 8
as the 8 working subsequences are:
[0] [2] [0] [1,2] [2, 0] [0, 1, 2] [1, 2, 0] [0, 1, 2, 0]
The issue is that simply going through all these subsequences (the i in range, j in range) takes almost more time than is allowed, and when the if statement is added, the code takes far too long to run on the server. (To be clear, this is only a small part of a larger problem, I'm not just asking for the solution to an entire problem). Anyway, is there any other way to check faster? I can't think of anything that wouldn't result in more operations needing to happen every time.

I think I see the problem: your terminology is incorrect / redundant. By definition, a sub-sequence is a series of consecutive elements.
Do not sum every candidate. Instead, identify every candidate whose sum is 1, and then subtract that total from the computed quantity of all sub-sequences (simple algebra).
All of the 1-sum candidates are of the regular expression form 0*10*: a 1 surrounded by any quantity of 0s on either or both sides.
Identify all such maximal-length strings. FOr instance, in
210002020001002011
you will pick out 1000, 000100, 01, and 1. For each string compute the quantity of substrings that contain the 1 (a simple equation on the lengths of the 0s on each side). Add up those quantities. Subtract from the total for the entire input. There's you answer.

Use sliding window technique to solve these type of problem. Take two variable to track first and last to track the scope of window. So you start with sum equal to first element. If the sum is larger than required value you subtract the 'first' element from sum and increment sum by 1. If the sum is smaller than required you add next element of 'last' pointer and increment last by 1. Every time sum is equal to required increment some counter.
As for NOT, count number of sub-sequence having '1' sum and then subtract from total number of sub-sequence possible, i.e. n * (n + 1) / 2

Sample Online Data Algorithm Analysis [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am going through a book called "Elements of Programming Interview" and have gotten stuck at the following problem:
Implement an algorithm that takes as input an array of distinct
elements and a size, and returns a subset of the given size of the
array elements. All subsets should be equally likely. Return the
result in input array itself.
The solution they provide below is:
import random
def random_sampling(k, A):
for i in range(k):
# Generate a random index in [i, len(A) - 1].
r = random.randint(i, len(A) - 1)
A[i], A[r] = A[r], A[i]
A = [3, 7, 5, 11]
k = 3
print(random_sampling(k, A))
I so not understand what the authors are trying to do intuitively. Their explanation is below
Another approach is to enumerate all subsets of size k and then select
one at random from these. Since there are (n to k) subsets of size k,
the time and space complexity are huge. The key to efficiently
building a random subset of size exactly k is to first build one of
size k - 1 and then adding one more element, selected randomly from
the rest. The problem is trivial when k = 1. We make one call to the
random number generator, take the returned value mod n (call it r),
and swap A[0] with A[r]. The entry A[0] now holds the result.
For k > 1, we begin by choosing one element at random as above and we
now repeat the same process with n - 1 element sub-array A[1, n -1].
Eventually, the random subset occupies the slots A[0, k - 1] and the
remaining elements are in the last n - k slots.
Intuitively, if all subsets of size k are equally likely, then the
construction process ensures that the subset of size k + 1 are also
equally likely. A formal proof for this uses mathematical induction -
the induction hypothesis is that every permutation of every size k
subset of A is equally likely to be in A[0, k -1].
As a concrete example, let the input be A = <3, 7, 5, 11> and the size
be 3. In the first iteration, we use the random number generator to
pick a random integer in the interval [0,3]. Let the returned random
number be 2. We swap A[0] with A[2] - now the array is <5, 7, 3, 11>.
Now we pick a random integer in the interval [1, 3]. Let the returned
random number be 3. We swap A[1] with A[3] - now the resulting array
is <5, 11, 3, 7>. Now we pick a random integer in the interval [2,3].
Let the returned random number be 2. When we swap A[2] with itself the
resulting array is unchanged. The random subset consists of he first
three entries, ie., {5, 11, 3}.
Sorry for the long text; my questions are this
What is the key to efficiency they are referring to? Its not clicking in my head
What did they mean by "eventually, the random subset occupies the slots A[0, k-1] and the remaining elements are in the last n - k slots"
is there a clear reason why "every permutation of every size k subset of A is equally likely to be in A[0, k - 1]"?
Can you explain the theory behind the algorithm in clearer terms?
What is the return of the algorithm supposed to be?
thanks

an intuitive solution might be
def random_sampling(k, A):
subset = []
selected = set()
for i in range(k):
index = random.randint(0, len(A) - 1)
while index in selected:
index = random.randint(0, len(A) - 1)
selected.add(index)
subset.append([A[index]])
return subset
but its not clear that every k subset has equal probability (because for the same k you may use different number of randoms on different ranges)
so a solution that fit the probability condition will be
import itertools as it
def random_sampling(k, A):
index_posibilities = [i for i in it.combinations(A,k)] #very expansive action
index = random.randint(0, len(index_posibilities) - 1)
selected = []
for i in index:
selected.append(A[i])
return selected
so the solution they gave makes sure you use the same procedure of randoms for every set of k elements without the brute force above
the order of the list is now, first k elements are these we selected, the rest of the list are the remaining items
this is the induction assumption, I assume that every set in length k-1 has the same probability and proof it for set of length k.
an efficient way to make sure the same probability for every k size sub set, is to do exactly the same steps to produce it
no return value because the list is being changed in the function is also changed in main, the subset is the first k elements of the list after the function being called

Sorting Technique Python

I'm trying to create a sorting technique that sorts a list of numbers. But what it does is that it compares two numbers, the first being the first number in the list, and the other number would be the index of 2k - 1.
2^k - 1 = [1,3,7, 15, 31, 63...]
For example, if I had a list [1, 4, 3, 6, 2, 10, 8, 19]
The length of this list is 8. So the program should find a number in the 2k - 1 list that is less than 8, in this case it will be 7.
So now it will compare the first number in the random list (1) with the 7th number in the same list (19). if it is greater than the second number, it will swap positions.
After this step, it will continue on to 4 and the 7th number after that, but that doesn't exist, so now it should compare with the 3rd number after 4 because 3 is the next number in 2k - 1.
So it should compare 4 with 2 and swap if they are not in the right place. So this should go on and on until I reach 1 in 2k - 1 in which the list will finally be sorted.
I need help getting started on this code.
So far, I've written a small code that makes the 2k - 1 list but thats as far as I've gotten.
a = []
for i in range(10):
a.append(2**(i+1) -1)
print(a)
EXAMPLE:
Consider sorting the sequence V = 17,4,8,2,11,5,14,9,18,12,7,1. The skipping
sequence 1, 3, 7, 15, … yields r=7 as the biggest value which fits, so looking at V, the first sparse subsequence =
17,9, so as we pass along V we produce 9,4,8,2,11,5,14,17,18,12,7,1 after the first swap, and
9,4,8,2,1,5,14,17,18,12,7,11 after using r=7 completely. Using a=3 (the next smaller term in the skipping
sequence), the first sparse subsequence = 9,2,14,12, which when applied to V gives 2,4,8,9,1,5,12,17,18,14,7,11, and the remaining a = 3 sorts give 2,1,8,9,4,5,12,7,18,14,17,11, and then 2,1,5,9,4,8,12,7,11,14,17,18. Finally, with a = 1, we get 1,2,4,5,7,8,9,11,12,14,17,18.
You might wonder, given that at the end we do a sort with no skips, why
this might be any faster than simply doing that final step as the only step at the beginning. Think of it as a comb
going through the sequence -- notice that in the earlier steps we’re using course combs to get distant things in the
right order, using progressively finer combs until at the end our fine-tuning is dealing with a nearly-sorted sequence
needing little adjustment.
p = 0
x = len(V) #finding out the length of V to find indexer in a
for j in a: #for every element in a (1,3,7....)
if x >= j: #if the length is greater than or equal to current checking value
p = j #sets j as p
So that finds what distance it should compare the first number in the list with but now i need to write something that keeps doing that until the distance is out of range so it switches from 3 to 1 and then just checks the smaller distances until the list is sorted.

The sorting algorithm you're describing actually is called Combsort. In fact, the simpler bubblesort is a special case of combsort where the gap is always 1 and doesn't change.
Since you're stuck on how to start this, here's what I recommend:
Implement the bubblesort algorithm first. The logic is simpler and makes it much easier to reason about as you write it.
Once you've done that you have the important algorithmic structure in place and from there it's just a matter of adding gap length calculation into the mix. This means, computing the gap length with your particular formula. You'll then modifying the loop control index and the inner comparison index to use the calculated gap length.
After each iteration of the loop you decrease the gap length(in effect making the comb shorter) by some scaling amount.
The last step would be to experiment with different gap lengths and formulas to see how it affects algorithm efficiency.

Pick N distinct items at random from sequence of unknown length, in only one iteration

I am trying to write an algorithm that would pick N distinct items from an sequence at random, without knowing the size of the sequence in advance, and where it is expensive to iterate over the sequence more than once. For example, the elements of the sequence might be the lines of a huge file.
I have found a solution when N=1 (that is, "pick exactly one element at random from a huge sequence"):
import random
items = range(1, 10) # Imagine this is a huge sequence of unknown length
count = 1
selected = None
for item in items:
if random.random() * count < 1:
selected = item
count += 1
But how can I achieve the same thing for other values of N (say, N=3)?

If your sequence is short enough that reading it into memory and randomly sorting it is acceptable, then a straightforward approach would be to just use random.shuffle:
import random
arr=[1,2,3,4]
# In-place shuffle
random.shuffle(arr)
# Take the first 2 elements of the now randomized array
print arr[0:2]
[1, 3]
Depending upon the type of your sequence, you may need to convert it to a list by calling list(your_sequence) on it, but this will work regardless of the types of the objects in your sequence.
Naturally, if you can't fit your sequence into memory or the memory or CPU requirements of this approach are too high for you, you will need to use a different solution.

Use reservoir sampling. It's a very simple algorithm that works for any N.
Here is one Python implementation, and here is another.

Simplest I've found is this answer in SO, improved a bit below:
import random
my_list = [1, 2, 3, 4, 5]
how_big = 2
new_list = random.sample(my_list, how_big)
# To preserve the order of the list, you could do:
randIndex = random.sample(range(len(my_list)), how_big)
randIndex.sort()
new_list = [my_list[i] for i in randIndex]

If you have python version of 3.6+ you can use choices
from random import choices
items = range(1, 10)
new_items = choices(items, k = 3)
print(new_items)
[6, 3, 1]

#NPE is correct, but the implementations that are being linked to are sub-optimal and not very "pythonic". Here's a better implementation:
def sample(iterator, k):
"""
Samples k elements from an iterable object.
:param iterator: an object that is iterable
:param k: the number of items to sample
"""
# fill the reservoir to start
result = [next(iterator) for _ in range(k)]
n = k - 1
for item in iterator:
n += 1
s = random.randint(0, n)
if s < k:
result[s] = item
return result
Edit As #panda-34 pointed out the original version was flawed, but not because I was using randint vs randrange. The issue is that my initial value for n didn't account for the fact that randint is inclusive on both ends of the range. Taking this into account fixes the issue. (Note: you could also use randrange since it's inclusive on the minimum value and exclusive on the maximum value.)

Following will give you N random items from an array X
import random
list(map(lambda _: random.choice(X), range(N)))

It should be enough to accept or reject each new item just once, and, if you accept it, throw out a randomly chosen old item.
Suppose you have selected N items of K at random and you see a (K+1)th item. Accept it with probability N/(K+1) and its probabilities are OK. The current items got in with probability N/K, and get thrown out with probability (N/(K+1))(1/N) = 1/(K+1) so survive through with probability (N/K)(K/(K+1)) = N/(K+1) so their probabilities are OK too.
And yes I see somebody has pointed you to reservoir sampling - this is one explanation of how that works.

As aix mentioned reservoir sampling works. Another option is generate a random number for every number you see and select the top k numbers.
To do it iteratively, maintain a heap of k (random number, number) pairs and whenever you see a new number insert to the heap if it is greater than smallest value in the heap.

This was my answer to a duplicate question (closed before I could post) that was somewhat related ("generating random numbers without any duplicates"). Since, it is a different approach than the other answers, I'll leave it here in case it provides additional insight.
from random import randint
random_nums = []
N = # whatever number of random numbers you want
r = # lower bound of number range
R = # upper bound of number range
x = 0
while x < N:
random_num = randint(r, R) # inclusive range
if random_num in random_nums:
continue
else:
random_nums.append(random_num)
x += 1
The reason for the while loop over the for loop is that it allows for easier implementation of non-skipping in random generation (i.e. if you get 3 duplicates, you won't get N-3 numbers).

There's one implementation from the numpy library.
Assuming that N is smaller than the length of the array, you'd have to do the following:
# my_array is the array to be sampled from
assert N <= len(my_array)
indices = np.random.permutation(N) # Generates shuffled indices from 0 to N-1
sampled_array = my_array[indices]
If you need to sample the whole array and not just the first N positions, then you can use:
import random
sampled_array = my_array[random.sample(len(my_array), N)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.