find major elements appears more than n/3 times - python

Working on below algorithm puzzle and debugging below solution works, for a few test cases. My confusion and question is, how could we always guarantee the count for an elements appears more than n/3 times have a positive count? There are another 2n/3 elements which could make it count negative? But I tried and it always work in my samples. If anyone could help to clarify, it will be great.
Here are the problem statement and my code/test cases,
Given an integer array of size n, find all elements that appear more than ⌊ n/3 ⌋ times. The algorithm should run in linear time and in O(1) space.
def majorityElement(nums):
if not nums:
return []
count1, count2, candidate1, candidate2 = 0, 0, 0, 0
for n in nums:
if n == candidate1:
count1 += 1
elif n == candidate2:
count2 += 1
elif count1 == 0:
candidate1, count1 = n, 1
elif count2 == 0:
candidate2, count2 = n, 1
else:
count1, count2 = count1 - 1, count2 - 1
return [n for n in (candidate1, candidate2) if nums.count(n) > len(nums) // 3]
if __name__ == "__main__":
# print majorityElement([1,2,1,3,1,5,6])
print majorityElement([2,3,1,2,1,3,1,5,5,1,6])
thanks in advance,
Lin

Conceptually, we repeatedly apply a reduction operation to the list that involves deleting three pairwise distinct items. This particular code does reductions online, so that the reduced list so far can be described by two different elements and their corresponding counts (because if there were a third element distinct from the other two, then we could reduce). At the end, we consider at most two elements for occurring more than n/3 times.
The interesting part of the correctness proof is a lemma that, whenever we perform this reduction operation, any element that occurred more n/3 times in the old list occurs more than n'/3 times in the new list, where n is the length of the old list and n' = n-3 is the length of the new list. This ensures by induction that the final list contains all elements occurring more than n/3 times in the initial list (but of course the final list contains only two distinct elements).
The proof of the lemma is that, if an item occurs k times out of n in the old list, then at worst it occurs k-1 times out of n-3 in the new list, and if k/n > 1/3, then
(k-1) n
(k-1)/(n-3) = -------
(n-3) n
k (n-3) + 3 k - n
= -----------------
(n-3) n
(k/n - 1/3)
= k/n + 3 -----------
n-3
> 1/3.

Related

All possible way of adding up number in a sequence so that it becomes a given number

I was given range n and number k. Count the possible ways so that two (not identical) number in that range add up to number k. And can this be done without nested loops?
Here's my approach, the only thing is I'm using a nested loop, which takes times and not computer-friendly. Opposite pairs like (A, B) and (B, A) still count as 1.
n, k = int(input()), int(input())
cnt = 0
for i in range(1, n+1):
for s in range(1, n+1):
if i == 1 and s == 1 or i == n+1 and s==n+1:
pass
else:
if i+s==k:
cnt += 1
print(int(cnt/2))
example inputs (first line is n, second is k)
8
5
explanation(1, 4 and 2, 3), so I should be printing 2
You only need a single loop for this:
n = int(input('N: ')) # range 1 to n
k = int(input('K: '))
r = set(range(1, n+1))
c = 0
while r:
if k - r.pop() in r:
c += 1
print(c)
If I understood you well it's gonna be just a single while loop counting up to k:
counter = 0
while counter<min(n,k)/2:
if counter+(k-counter) == k: # This is always true actually...
print(counter,k-counter)
counter+=1
Starting from 0 up to k those pairs are gonna be counter and k - counter (complement to k, so result of subtracting the counter from k)
We should can count up to smaller of the two n and k, cause numbers bigger than k are not gonna add up to k
Actually we should count up to a half of that, cause we're gonna get symmetric results after that.
So considering you don't want to print each pair it's actually:
count = int(min(n,k)//2)
why are you iterating and checking number combinations when you can mathematically derive the count of valid pairs using n and k itself?
depending on whether n or k being larger the number of pairs can be calculated directly
Every number i within n range has a matching pair k-i
and depending on whether n or k which greater we need to validate whether k-i and i both are within the range n and not equal.
for n>=k case the valid range is from 1 to k-1
and for the other case the valid range is from k-n to n
and the count of a range a to b is b-a+1
since in both conditions the pairs are symmetrical these range count should be halved.
so the entire code becomes
n= int(input())
k=int(input())
if n>=k:print(int((k-1)/2))
if n<k:print(int((2*n-(k-1))/2))
A problem of combinatorics. The following code uses python's built-in library to generate all possible combinations
from itertools import combinations
n = 10
k = 5
n_range = [i for i in range(1, n+1)]
result = []
for i in n_range:
n_comb = combinations(n_range, i)
for comb in n_comb:
if sum(comb) == k:
result.append(comb)
print(result)

Incorrect output Project Euler #50

Project Euler problem 50 reads as follows:
The prime 41, can be written as the sum of six consecutive primes:
41 = 2 + 3 + 5 + 7 + 11 + 13
This is the longest sum of consecutive primes that adds to a prime below one-hundred.
The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.
Which prime, below one-million, can be written as the sum of the most consecutive primes?
In my approach I pregenerate a list of primes using sieve of eratosthenes, then
in the function itself I keep adding succeeding elements of my prime number list
and each time i do that I check if the sum itself is prime and if it is I keep track of it as the biggest one and return it. Well that should work i guess ? Obviously the answer is incorrect, but the interesting thing is that when i change the sieve to generate primes below 100000 it doesn't give an index error but gives another result.
from algorithms import gen_primes
primes = [i for i in gen_primes(1000000)]
def main(n):
idx, total, maximum = 0, 0, 0
while total < n:
total += primes[idx]
idx += 1
if total in primes:
maximum = total
return maximum
print(main(1000000))
Your program doesn't solve the general problem: you always start your list of consecutive primes at the lowest, 2. Thus, what you return is the longest consecutive list starting at 2*, rather than any consecutive list of primes.
In short, you need another loop ...
start_idx = 0
while start_idx < len(primes) and best_len*primes[start_idx] < n:
# find longest list starting at primes[start_idx]
start_idx += 1
In case it's any help, the successful sequence begins between 1500 and 2000.

Check if a number is found x amount of times consecutively in a 2d list

I want to know if a number in a list is found j times consecutively ,this is my list :
list=[[1, 1, 1,1],
[0, 0, 0,0],
[2, 2, 2,2]
[2, 2, 2,2]]
And this is what i wrote :
def alignment(list,n,j):
for y in range (n):
for x in range (n-j):
counter = 0
for z in range(j):
if list[y][x]== list[y][x+z]:
counter+=1
if counter == j :
return True
But this function will check if any number is found consecutively,i want to add another parameter to this function so i can specify what number i want to look for in the list .
n means there are n rows and columns and j is how many times is how many times the number needs to be found .
Your requirements are unclear. However, this would be a slightly modified version of your code which would yield what I believe you're seeking.
target is the number for which you want to know if there are j consecutive entries.
def alignment(list,n,j,target):
for y in range (n):
for x in range (n-j):
counter = 0
if list[y][x] == target:
for z in range(j):
if list[y][x]== list[y][x+z]:
counter+=1
if counter == j :
return True
def alignment(nums,j,target):
for row in nums: # get each row
counter = 0
for i in row: # get each number
if i != target: # check if some other number was gotten
if counter == j:
return True
counter = 0 # reset counter
continue
counter += 1
if counter == j:
return True
return False
No need for the n argument.
There are a few problems with your code:
the n parameter is not needed, you can get the size of the list by using len(list)
you should not use list as a variable name, as it shadows the builtin list function
with for x in range (n-j) you are assuming that each sublist has the same number of elements as the parent list
your function also returns True if the number appears more than j times in a row
you are doing lots of double work by using three loops instead of just two
You can fix this, and also add the parameter for the number to be repeated, as shown in the other answers. However, using just loops and conditions, the resulting code will be very unwieldy.
Instead, you can create a function as you describe using any and itertools.groupby. groupby groups equal numbers, then you just have the check the len of those groups and see if any is long enough, and whether it's the correct number, for any of the sublists.
def alignment(lst, num, count):
return any(any(n == num and len(list(g)) == count
for n, g in itertools.groupby(l))
for l in lst)
This will return True if num appears exactly count times consecutively in any of the sublists.

Calculating Polygonal Numbers Taking A While To Calculate

I've created a function which, hopefully, creates a list of numbers that are both pentagonal and square.
Here is what i've got so far:
def sqpent(n):
i = 0
list = []
while n >= 0:
if n == 0:
list.append(0)
elif n == 1:
list.append(1)
elif (i*i == (i*(3*i-1)//2)):
list.append(i)
n -= 1
i += 1
But when it gets past the first two numbers it seems to be taking a while to do so...
You have two issues: the first is that the special-casing for n==0 and n==1 doesn't decrease n, so it goes into an infinite loop. The special-casing isn't really needed and can be dropped.
The second, and more significant one, is that in the test i*i == (i*(3*i-1)//2) you are assuming that the index i will be the same for the square and pentagonal number. But this will only happen for i==0 and i==1, so you won't find values past that.
I suggest:
Iterate over i instead of n to make things simpler.
Take the ith pentagonal number and check if it is a square number (e.g. int(sqrt(x))**2 == x).
Stop when you've reached n numbers.
Thanks to #interjay's advice, I came up with this answer which works perfectly:
import math
def sqpent(n):
counter = 0
i = 0
l = []
while counter < n:
x = (i*(3*i-1)//2)
#print(x)
if(int(math.sqrt(x))**2 == x):
#print("APPENDED: " + str(x))
l.append(x)
counter += 1
i += 1
return l
For an explanation:
It iterates through a value i, and gets the ith pentagonal number. Then it checks if it is a square and if so it appends it to a list which i ultimately return.
It does this until a final point when the counter reaches the number of items in the list you want.

Using Python for quasi randomization

Here's the problem: I try to randomize n times a choice between two elements (let's say [0,1] -> 0 or 1), and my final list will have n/2 [0] + n/2 [1]. I tend to have this kind of result: [0 1 0 0 0 1 0 1 1 1 1 1 1 0 0, until n]: the problem is that I don't want to have serially 4 or 5 times the same number so often. I know that I could use a quasi randomisation procedure, but I don't know how to do so (I'm using Python).
To guarantee that there will be the same number of zeros and ones you can generate a list containing n/2 zeros and n/2 ones and shuffle it with random.shuffle.
For small n, if you aren't happy that the result passes your acceptance criteria (e.g. not too many consecutive equal numbers), shuffle again. Be aware that doing this reduces the randomness of the result, not increases it.
For larger n it will take too long to find a result that passes your criteria using this method (because most results will fail). Instead you could generate elements one at a time with these rules:
If you already generated 4 ones in a row the next number must be zero and vice versa.
Otherwise, if you need to generate x more ones and y more zeros, the chance of the next number being one is x/(x+y).
You can use random.shuffle to randomize a list.
import random
n = 100
seq = [0]*(n/2) + [1]*(n-n/2)
random.shuffle(seq)
Now you can run through the list and whenever you see a run that's too long, swap an element to break up the sequence. I don't have any code for that part yet.
Having 6 1's in a row isn't particularly improbable -- are you sure you're not getting what you want?
There's a simple Python interface for a uniformly distributed random number, is that what you're looking for?
Here's my take on it. The first two functions are the actual implementation and the last function is for testing it.
The key is the first function which looks at the last N elements of the list where N+1 is the limit of how many times you want a number to appear in a row. It counts the number of ones that occur and then returns 1 with (1 - N/n) probability where n is the amount of ones already present. Note that this probability is 0 in the case of N consecutive ones and 1 in the case of N consecutive zeros.
Like a true random selection, there is no guarantee that the ratio of ones and zeros will be the 1 but averaged out over thousands of runs, it does produce as many ones as zeros.
For longer lists, this will be better than repeatedly calling shuffle and checking that it satisfies your requirements.
import random
def next_value(selected):
# Mathematically, this isn't necessary but it accounts for
# potential problems with floating point numbers.
if selected.count(0) == 0:
return 0
elif selected.count(1) == 0:
return 1
N = len(selected)
selector = float(selected.count(1)) / N
if random.uniform(0, 1) > selector:
return 1
else:
return 0
def get_sequence(N, max_run):
lim = min(N, max_run - 1)
seq = [random.choice((1, 0)) for _ in xrange(lim)]
for _ in xrange(N - lim):
seq.append(next_value(seq[-max_run+1:]))
return seq
def test(N, max_run, test_count):
ones = 0.0
zeros = 0.0
for _ in xrange(test_count):
seq = get_sequence(N, max_run)
# Keep track of how many ones and zeros we're generating
zeros += seq.count(0)
ones += seq.count(1)
# Make sure that the max_run isn't violated.
counts = [0, 0]
for i in seq:
counts[i] += 1
counts[not i] = 0
if max_run in counts:
print seq
return
# Print the ratio of zeros to ones. This should be around 1.
print zeros/ones
test(200, 5, 10000)
Probably not the smartest way, but it works for "no sequential runs", while not generating the same number of 0s and 1s. See below for version that fits all requirements.
from random import choice
CHOICES = (1, 0)
def quasirandom(n, longest=3):
serial = 0
latest = 0
result = []
rappend = result.append
for i in xrange(n):
val = choice(CHOICES)
if latest == val:
serial += 1
else:
serial = 0
if serial >= longest:
val = CHOICES[val]
rappend(val)
latest = val
return result
print quasirandom(10)
print quasirandom(100)
This one below corrects the filtering shuffle idea and works correctly AFAICT, with the caveat that the very last numbers might form a run. Pass debug=True to check that the requirements are met.
from random import random
from itertools import groupby # For testing the result
try: xrange
except: xrange = range
def generate_quasirandom(values, n, longest=3, debug=False):
# Sanity check
if len(values) < 2 or longest < 1:
raise ValueError
# Create a list with n * [val]
source = []
sourcelen = len(values) * n
for val in values:
source += [val] * n
# For breaking runs
serial = 0
latest = None
for i in xrange(sourcelen):
# Pick something from source[:i]
j = int(random() * (sourcelen - i)) + i
if source[j] == latest:
serial += 1
if serial >= longest:
serial = 0
guard = 0
# We got a serial run, break it
while source[j] == latest:
j = int(random() * (sourcelen - i)) + i
guard += 1
# We just hit an infinit loop: there is no way to avoid a serial run
if guard > 10:
print("Unable to avoid serial run, disabling asserts.")
debug = False
break
else:
serial = 0
latest = source[j]
# Move the picked value to source[i:]
source[i], source[j] = source[j], source[i]
# More sanity checks
check_quasirandom(source, values, n, longest, debug)
return source
def check_quasirandom(shuffled, values, n, longest, debug):
counts = []
# We skip the last entries because breaking runs in them get too hairy
for val, count in groupby(shuffled):
counts.append(len(list(count)))
highest = max(counts)
print('Longest run: %d\nMax run lenght:%d' % (highest, longest))
# Invariants
assert len(shuffled) == len(values) * n
for val in values:
assert shuffled.count(val) == n
if debug:
# Only checked if we were able to avoid a sequential run >= longest
assert highest <= longest
for x in xrange(10, 1000):
generate_quasirandom((0, 1, 2, 3), 1000, x//10, debug=True)

Categories

Resources