The run time is too long - python

When I test it with code, there is no error, but it fails with a timeout.
Problem
The failure rate is defined as follows.
Number of players / number of players as far as stage is concerned
The total number N of stages, the game can be currently stopped by the user. .
Limitations
The number N of stages is a natural number of 1 or more and 500 or less.
The length of the stage is 1 or more and 200,000 or less.
Contains natural water above step 1 and below N + 1.
Each natural number is currently being challenged by the user.
N + 1 is the final stage.
There is still a failure rate.
The success rate of the stage is zero.
My code
def solution(N, stages):
fail = []
for i in range(1,N+1):
no_clear = stages.count(i)
he_stage = sum([stages.count(x) for x in range(i,N+2)])
if no_clear==0:
fail.append((i,0))
else:
fail.append((i,no_clear/he_stage))
fail=sorted(fail,key=lambda x: (-x[1],x[0]))
print(fail)
return [fail[i][0] for i in range(N)]

I suppose stages is a list. Calling count repeatedly on a list has a very high complexity, specially if you're doing that in a loop.
You could use a cache or maybe simpler: replace stages.count(x) by a collections.Counter object call
before:
def solution(N, stages):
fail = []
for i in range(1,N+1):
no_clear = stages.count(i)
he_stage = sum([stages.count(x) for x in range(i,N+2)])
after:
import collections
def solution(N, stages):
fail = []
stages_counter = collections.Counter(stages)
for i in range(1,N+1):
no_clear = stages_counter[i]
he_stage = sum(stages_counter[x] for x in range(i,N+2))
This will reduce your complexity a great deal. Elements are counted once and for all. Just access the dictionary in O(1) time once it's done.

Related

Algorithm: to distribute n elements into m buckets, such that the maximum sum of elements among buckets is minimum

The goal is to minimize(max(bucket1,bucket2,...,bucketn))
I have decided, that greedy approach must work:
def algo(values,k = 4):
values_sort = sorted(values,reverse = True)#sorting values in descending order
buckets = [[] for b in range(k)]#creating buckets
for i in range(len(values_sort)):#If buckets are empty add biggest elements to them
if i < k:
buckets[i].append(values_sort[i])
else:# Greedy approach
sums = [sum(b) for b in buckets]
index = sums.index(min(sums))#add new element to the local minimum(the smallest sum of time among all buckets)
buckets[index].append(values_sort[i])
return buckets
I have compared my greedy solution to the random assingment:
#random assingment to the buckets
def algo_random(time,k):
buckets = [[] for k in range(k)]
count = 0
for i in range(len(time)):
buckets[count].append(time[i])
count +=1
if count == k:
count = 0
return buckets
I ran the code bellow, where I compared greedy solution to the random assingment 1 million times:
for i in range(1000000):
time = [uniform(0, 1000.0) for i in range(100)]
#algo random
rand = algo_random(time,4)
t_rand = max([sum(x) for x in rand])
#algo optimal
algo_o = algo(time,4)
t_o = max([sum(x) for x in algo_o])
if t_rand < t_o:
print('False')
And in 2 cases out of 1 million, random assingement was better than greedy solution. It means, that my algorithm(greedy solution) is not optimal. Can you help me to correct my algorithm?
EDIT: I have noticed, that algorithm works well for big number of records and bad for small number of records
This problem is sometimes called the job shop scheduling problem and is known to be NP-hard, so there are no known greedy algorithms that run efficiently and always produce the optimal solution.

Determine the time complexity based on my codes and do some alteration

I have a question regarding time complexity in my codes. I tried to refer to a number of sites and tutorials and also stackoverflow explanations but still could not understand how we calculate or determine the time complexity of codes. If possible, is it ok to check my codes and explain based on it and also provide some EASY examples or links (Im still learning) to study it.
I tried this code and try to submit it but the time complexity is high so it is not accepted. Is there a way to reduce the time complexity too?
question is from this link
def large_element(array):
stack = []
took = False
for i in range(len(array)):
s_integer = array[i+1:]
if i != len(array)-1:
for ii in range(len(s_integer)):
if array[i] < s_integer[ii] and took == False:
stack.append(s_integer[ii])
took = True
elif array[i] > s_integer[ii] and took == False and ii == len(s_integer)-1:
stack.append(-1)
took = True
took = False
else:
stack.append(-1)
return stack
import time
start_time = time.time()
integer = [4,3,2,1]
print(large_element(integer))
My current understanding is that my code have 2 times for loop to loop each element so this will be O(n2)?
By the way, the output is:
[-1, -1, -1, -1]
A simplified yet powerful way of doing this is giving each line of code a cost and count how many times this line is run. Of course, you should keep your lines of code simple for this to make sense.
The cost of a line does not need to be precise, as constants are ignored in Big O notation.
Simple example:
n equals the size of the list x
def print_list(x : list):
for i in x: # cost = c1; count = n
print(i) # cost = c2; count = n
print('done') # cost = c3; count = 1
The line with the for is called n times because although the for is executed only once, the comparison to decide if the loop should continue is made n times.
The time complexity of this function is equal to the sum of the products of the cost of each line and the amount of times it is repeated:
complexity = c1×n + c2×n + c3×1 = O(n)
In this case, the costs are ignored because they happened to be constants. However, the cost of an instruction can be dependent of the size of the input, most of times when the instructions calls a subroutine.
Also, in the example I gave, each instruction was at most called n times, but in places like nested loops this count may be n², log(n), etc.
I would recommend reading the book Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein. This explanation is mostly based on what the books says in the first chapters.

How to refine number permutations for efficiency

I am working on a dice probability program and have been running into some efficiency issues in the permutation section when the numbers get big. For example, the perimeters I am required to run are 10 dice, with 10 sides, with an outcome of 50.
I require a total number of permutations to calculate the probability of the specified outcome given the number of dice and number of sides. The final_count(total, dice, faces) function lets the least number of combinations pass from the generator before moving into the perms(x) function.
The following code works, but for the previously mentioned perimeters it takes an extremely long time.
The perms(x) was posted by #Ashish Datta from this thread:
permutations with unique values
Which is where I believe I need help.
import itertools as it
total = 50
dice = 10
faces = 10
#-------------functions---------------------
# Checks for lists of ALL the same items
def same(lst):
return lst[1:] == lst[:-1]
# Generates the number of original permutations (10 digits takes 1.65s)
def perms(x):
uniq_set = set()
for out in it.permutations(x, len(x)):
if out not in uniq_set:
uniq_set.update([out])
return len(uniq_set)
# Finds total original dice rolls. "combinations" = (10d, 10f, 50t, takes 0.42s)
def final_count(total, dice, faces):
combinations = (it.combinations_with_replacement(range(1, faces+1), dice))
count = 0
for i in combinations:
if sum(i) == total and same(i) == True:
count += 1
elif sum(i) == total and same(i) != True:
count += perms(i)
else:
pass
return count
# --------------functions-------------------
answer = final_count(total, dice, faces) / float(faces**dice)
print(round(answer,4))
I have read the thread How to improve permutation algorithm efficiency with python. I believe my question is different, though a smarter algorithm is my end goal.
I originally posted my first draft of this program in CodeReview. https://codereview.stackexchange.com/questions/212930/calculate-probability-of-dice-total. I realize I am walking a fine line between a question and a code review, but I think in this case, I am more on the question side of things :)
You can use a function that deducts the current dice rolls from the totals for the recursive calls, and short-circuit the search if the total is less than 1 or greater than the number of dices times the number of faces. Use a cache to avoid redundant calculations of the same parameters:
from functools import lru_cache
#lru_cache(maxsize=None)
def final_count(total, dice, faces):
if total < 1 or total > dice * faces:
return 0
if dice == 1:
return 1
return sum(final_count(total - n, dice - 1, faces) for n in range(1, faces + 1))
so that:
final_count(50, 10, 10)
returns within a second: 374894389
I had a similar solution to blhsing but he beat me to it and, to be honest I didn't think of using lru_cache (nice! +1 for that). I'm posting it anyhow if only to illustrate how storage of previously computed counts cuts down on the recursion.
def permutationsTo(target, dices, faces, computed=dict()):
if target > dices*faces or target < 1: return 0
if dices == 1 : return 1
if (target,dices) in computed: return computed[(target,dices)]
result = 0
for face in range(1,min(target,faces+1)):
result += permutationsTo(target-face,dices-1,faces,computed)
computed[(target,dices)] = result
return result
One way to greatly reduce the time is to mathematically count how many combinations there are for each unique group of numbers in combinations, and increment count by that amount. If you have a list of n objects where x1 of them are all alike, x2 of them are all alike, etc., then the total number of ways to arrange them is n!/(x1! x2! x3! ...). For example, the number of different ways to arrange the letters of "Tennessee" is 9!/(1! 4! 2! 2!). So you can make a separate function for this:
import math
import itertools as it
import time
# Count the number of ways to arrange a list of items where
# some of the items may be identical.
def indiv_combos(thelist):
prod = math.factorial(len(thelist))
for i in set(thelist):
icount = thelist.count(i)
prod /= math.factorial(icount)
return prod
def final_count2(total, dice, faces):
combinations = it.combinations_with_replacement(range(1, faces + 1), dice)
count = 0
for i in combinations:
if sum(i) == total:
count += indiv_combos(i)
return count
I don't know off-hand if there's already some built-in function that does the job of what I wrote as indiv_combos2, but you could also use Counter to do the counting and mul to take the product of a list:
from operator import mul
from collections import Counter
def indiv_combos(thelist):
return math.factorial(len(thelist)) / reduce(mul, [math.factorial(i) for i in Counter(thelist).values()],1)
I get mixed results on the times when I try both methods with (25, 10, 10) as the input, but both give me the answer in less than 0.038 seconds every time.

Randomly sampling numerals from a list whose aggregate needs to be at least greater than a given benchmark

I have a list of tuples formed by 1000 object ids and their scores, i.e.:
scored_items = [('14',534.9),('4',86.0),('78',543.21),....].
Let T be the aggregated score of the top 20 highest scoring items.
That's easy. Using python:
top_20 = sorted(score_items, key=lambda k: k[1],reverse = True)[:20]
T = sum(n for _, n in top_20)
Next, let t equal a quarter of T. I.e. in python: t = math.ceil(T/4)
My question is: what's the most efficient way to randomly select 20 items (without replacement) from scored_items such that their aggregated score is equal to or greater than (but never lower than) t? They may or may not include items from top_20.
Would prefer an answer in Python, and would prefer to not rely on external libraries much
Background: This is an item-ranking algorithm that is strategy proof according to an esoteric - but useful - Game Theory theorem. Source: section 2.5 in this paper, or just read footnote 18 on page 11 of this same link. Btw strategy proof essentially means it's tough to game it.
I'm a neophyte python programmer and have been mulling how to solve this problem for a while now, but just can't seem to wrap my head around it. Would be great to know how the experts would approach and solve this.
I suppose the most simplistic (and least performant perhaps) way is to keep randomly generating sets of 20 items till their scores' sum exceeds or equals t.
But there has to be a better way to do this right?
Here is an implementation of what I mentioned in the comments.
Since we want items such that the sum of the scores is large, we can weight the choice so that we are more likely to pick samples with large scores.
import numpy as np
import math
def normalize(p):
return p/sum(p)
def get_sample(scored_items, N=20, max_iter = 1000):
topN = sorted(scored_items, key=lambda k: k[1],reverse = True)[:N]
T = sum(n for _, n in topN)
t = math.ceil(T/4)
i = 0
scores = np.array([x[1] for x in scored_items])
p=normalize(scores)
while i < max_iter:
sample_indexes = np.random.choice(a=range(len(ids)), size=N, replace=False, p=p)
sample = [scored_items[x] for x in sample_indexes]
if sum(n for _, n in sample) >= t:
print("Found a solution at iteration %d"%i)
return sample
i+=1
print("Could not find a solution after %d iterations"%max_iter)
return None
An example of how to use it:
np.random.seed(0)
ids = range(1000)
scores = 10000*np.random.random_sample(size=len(ids))
scored_items = list(zip(map(str, ids), scores))
sample = get_sample(scored_items, 20)
#Found a solution at iteration 0
print(sum(n for _, n in sample))
#139727.1229832652
Though this is not guaranteed to get a solution, I ran this in a loop 100 times and each time a distinct solution was found on the first iteration.
Though I do not know of a efficient way for huge lists something like this works even for 1000 or so items. You can do a bit better if you don't need True randomness
import random
testList = [x for x in range(1,1000)]
T = sum(range(975, 1000))/4
while True:
rs = random.sample(testList, 15)
if sum(rs) >= t: break
print rs

An algorithm for randomly generating integer partitions of a particular length, in Python?

I've been using the random_element() function provided by SAGE to generate random integer partitions for a given integer (N) that are a particular length (S). I'm trying to generate unbiased random samples from the set of all partitions for given values of N and S. SAGE's function quickly returns random partitions for N (i.e. Partitions(N).random_element()).
However, it slows immensely when adding S (i.e. Partitions(N,length=S).random_element()). Likewise, filtering out random partitions of N that are of length S is incredibly slow.
However, and I hope this helps someone, I've found that in the case when the function returns a partition of N not matching the length S, that the conjugate partition is often of length S. That is:
S = 10
N = 100
part = list(Partitions(N).random_element())
if len(part) != S:
SAD = list(Partition(part).conjugate())
if len(SAD) != S:
continue
This increases the rate at which partitions of length S are found and appears to produce unbiased samples (I've examined the results against entire sets of partitions for various values of N and S).
However, I'm using values of N (e.g. 10,000) and S (e.g. 300) that make even this approach impractically slow. The comment associated with SAGE's random_element() function admits there is plenty of room for optimization. So, is there a way to more quickly generate unbiased (i.e. random uniform) samples of integer partitions matching given values of N and S, perhaps, by not generating partitions that do not match S? Additionally, using conjugate partitions works well in many cases to produce unbiased samples, but I can't say that I precisely understand why.
Finally, I have a definitively unbiased method that has a zero rejection rate. Of course, I've tested it to make sure the results are representative samples of entire feasible sets. It's very fast and totally unbiased. Enjoy.
from sage.all import *
import random
First, a function to find the smallest maximum addend for a partition of n with s parts
def min_max(n,s):
_min = int(floor(float(n)/float(s)))
if int(n%s) > 0:
_min +=1
return _min
Next, A function that uses a cache and memoiziation to find the number of partitions
of n with s parts having x as the largest part. This is fast, but I think there's
a more elegant solution to be had. e.g., Often: P(N,S,max=K) = P(N-K,S-1)
Thanks to ante (https://stackoverflow.com/users/494076/ante) for helping me with this:
Finding the number of integer partitions given a total, a number of parts, and a maximum summand
D = {}
def P(n,s,x):
if n > s*x or x <= 0: return 0
if n == s*x: return 1
if (n,s,x) not in D:
D[(n,s,x)] = sum(P(n-i*x, s-i, x-1) for i in xrange(s))
return D[(n,s,x)]
Finally, a function to find uniform random partitions of n with s parts, with no rejection rate! Each randomly chosen number codes for a specific partition of n having s parts.
def random_partition(n,s):
S = s
partition = []
_min = min_max(n,S)
_max = n-S+1
total = number_of_partitions(n,S)
which = random.randrange(1,total+1) # random number
while n:
for k in range(_min,_max+1):
count = P(n,S,k)
if count >= which:
count = P(n,S,k-1)
break
partition.append(k)
n -= k
if n == 0: break
S -= 1
which -= count
_min = min_max(n,S)
_max = k
return partition
I ran into a similar problem when I was trying to calculate the probability of the strong birthday problem.
First off, the partition function explodes when given only modest amount of numbers. You'll be returning a LOT of information. No matter which method you're using N = 10000 and S = 300 will generate ridiculous amounts of data. It will be slow. Chances are any pure python implementation you use will be equally slow or slower. Look to making a CModule.
If you want to try python the approach I took as a combination of itertools and generators to keep memory usage down. I don't seem to have my code handy anymore, but here's a good impementation:
http://wordaligned.org/articles/partitioning-with-python
EDIT:
Found my code:
def partition(a, b=-1, limit=365):
if (b == -1):
b = a
if (a == 2 or a == 3):
if (b >= a and limit):
yield [a]
else:
return
elif (a > 3):
if (a <= b):
yield [a]
c = 0
if b > a-2:
c = a-2
else:
c = b
for i in xrange(c, 1, -1):
if (limit):
for j in partition(a-i, i, limit-1):
yield [i] + j
Simple approach: randomly assign the integers:
def random_partition(n, s):
partition = [0] * s
for x in range(n):
partition[random.randrange(s)] += 1
return partition

Categories

Resources