I have a problem of probability algorithm
The goal is obtain a list which contains three items. as the FinalList
There has Four source lists.
ALIST, BLIST, CLIST, DLIST
There are all Unknown length. They contains unique elements
( In fact, there are all empty at the program beginning, get from redis sorted list. when running, there growing )
Choose items form this source lists. pick up random items to generate the FinalList
Ensure The Following Requirements
In the FinalList,
probability of ALIST's item appeared is 43%
probability of BLIST's item appeared is 37%
probability of CLIST's item appeared is 19%
probability of DLIST's item appeared is 1%
I have written some code, but this just for the four lists are have a lots of elements.
from random import choice
final_list = []
slot = []
a_picked_times = 0
while a_picked_times < 43:
item = choice(ALIST)
ALIST.remove(item)
if item in already_picked_list:
continue
slot.append(item)
a_picked_times += 1
b_picked_times = 0
while b_picked_times < 37:
...
SOME CODE SIMILAR
# now slot is a list which contains 100 elements,
# in slot, there are 43 elements of ALIST'items, 37 of B, 19 of C, 1 of D
for i in range(3):
final_list.append( choice(slot) )
So, this can ensure the probability requirements. BUT only under the condition: this Four lists have a lots of elements.
list.remove( item ) that will not remove all elements in list, so we will correct pick up items with the needs times.
when A, B, C, D empty OR not enough elements, How could ensure the probability requirements?
A, B, C, D list are all get from redis sorted list. Or some solution with redis ?
It might make more sense to (for each element) pick a number between 1 and 100 and then select a source list based on that.
As I understand it, you're generating lists of random sizes, then you want to choose 3 with the given probability. If my understanding is correct, then you need to simply generate a uniform variate on [0,1] with random.uniform(0., 1.).
Then simply partition the 0..1 interval into the appropriate lengths:
import random
for i in range(3):
r = random.uniform(0., 1.)
if r < .43:
final_list.append(random.choice(ALIST))
elif r < .43 + .37:
final_list.append(random.choice(BLIST))
elif r < .43 + .37 + .19:
final_list.append(random.choice(CLIST))
else:
final_list.append(random.choice(DLIST))
Choosing from the lists should be easy, since you just pick an index.
Note that this is equivalent to Ofir's answer, but may or may not appeal to you more.
So from what I can gather, your code is removing exactly 43 ALIST elements, 37 BLIST elements, etc.
A better solution would be to construct your final_list by using the given probabilities. This will also take into account when your other lists are empty.
ALIST_PROB = 0.43
BLIST_PROB = ALIST_PROB + 0.37
CLIST_PROB = BLIST_PROB + 0.19
DLIST_PROB = CLIST_PROV + 0.01
while len(final_list) < 3:
#generate a random number
rand = random.random()
if rand <= ALIST_PROB:
element = getEl(ALIST)
elif rand <= BLIST_PROB:
element = getEl(BLIST)
elif rand <= CLIST_PROB:
element = getEl(CLIST)
elif rand <= DLIST_PROB:
element = getEl(DLIST)
if not element == None:
final_list.append(element)
def getEl(list):
try:
element = random.choice(list)
except IndexError:
element = None
return element
Related
I'm trying to write the fastest algorithm possible to return the number of "magic triples" (i.e. x, y, z where z is a multiple of y and y is a multiple of x) in a list of 3-2000 integers.
(Note: I believe the list was expected to be sorted and unique but one of the test examples given was [1,1,1] with the expected result of 1 - that is a mistake in the challenge itself though because the definition of a magic triple was explicitly noted as x < y < z, which [1,1,1] isn't. In any case, I was trying to optimise an algorithm for sorted lists of unique integers.)
I haven't been able to work out a solution that doesn't include having three consecutive loops and therefore being O(n^3). I've seen one online that is O(n^2) but I can't get my head around what it's doing, so it doesn't feel right to submit it.
My code is:
def solution(l):
if len(l) < 3:
return 0
elif l == [1,1,1]:
return 1
else:
halfway = int(l[-1]/2)
quarterway = int(halfway/2)
quarterIndex = 0
halfIndex = 0
for i in range(len(l)):
if l[i] >= quarterway:
quarterIndex = i
break
for i in range(len(l)):
if l[i] >= halfway:
halfIndex = i
break
triples = 0
for i in l[:quarterIndex+1]:
for j in l[:halfIndex+1]:
if j != i and j % i == 0:
multiple = 2
while (j * multiple) <= l[-1]:
if j * multiple in l:
triples += 1
multiple += 1
return triples
I've spent quite a lot of time going through examples manually and removing loops through unnecessary sections of the lists but this still completes a list of 2,000 integers in about a second where the O(n^2) solution I found completes the same list in 0.6 seconds - it seems like such a small difference but obviously it means mine takes 60% longer.
Am I missing a really obvious way of removing one of the loops?
Also, I saw mention of making a directed graph and I see the promise in that. I can make the list of first nodes from the original list with a built-in function, so in principle I presume that means I can make the overall graph with two for loops and then return the length of the third node list, but I hit a wall with that too. I just can't seem to make progress without that third loop!!
from array import array
def num_triples(l):
n = len(l)
pairs = set()
lower_counts = array("I", (0 for _ in range(n)))
upper_counts = lower_counts[:]
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[i] += 1
upper_counts[j] += 1
return sum(nx * nz for nz, nx in zip(lower_counts, upper_counts))
Here, lower_counts[i] is the number of pairs of which the ith number is the y, and z is the other number in the pair (i.e. the number of different z values for this y).
Similarly, upper_counts[i] is the number of pairs of which the ith number is the y, and x is the other number in the pair (i.e. the number of different x values for this y).
So the number of triples in which the ith number is the y value is just the product of those two numbers.
The use of an array here for storing the counts is for scalability of access time. Tests show that up to n=2000 it makes negligible difference in practice, and even up to n=20000 it only made about a 1% difference to the run time (compared to using a list), but it could in principle be the fastest growing term for very large n.
How about using itertools.combinations instead of nested for loops? Combined with list comprehension, it's cleaner and much faster. Let's say l = [your list of integers] and let's assume it's already sorted.
from itertools import combinations
def div(i,j,k): # this function has the logic
return l[k]%l[j]==l[j]%l[i]==0
r = sum([div(i,j,k) for i,j,k in combinations(range(len(l)),3) if i<j<k])
#alaniwi provided a very smart iterative solution.
Here is a recursive solution.
def find_magicals(lst, nplet):
"""Find the number of magical n-plets in a given lst"""
res = 0
for i, base in enumerate(lst):
# find all the multiples of current base
multiples = [num for num in lst[i + 1:] if not num % base]
res += len(multiples) if nplet <= 2 else find_magicals(multiples, nplet - 1)
return res
def solution(lst):
return find_magicals(lst, 3)
The problem can be divided into selecting any number in the original list as the base (i.e x), how many du-plets we can find among the numbers bigger than the base. Since the method to find all du-plets is the same as finding tri-plets, we can solve the problem recursively.
From my testing, this recursive solution is comparable to, if not more performant than, the iterative solution.
This answer was the first suggestion by #alaniwi and is the one I've found to be the fastest (at 0.59 seconds for a 2,000 integer list).
def solution(l):
n = len(l)
lower_counts = dict((val, 0) for val in l)
upper_counts = lower_counts.copy()
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[lower] += 1
upper_counts[upper] += 1
return sum((lower_counts[y] * upper_counts[y] for y in l))
I think I've managed to get my head around it. What it is essentially doing is comparing each number in the list with every other number to see if the smaller is divisible by the larger and makes two dictionaries:
One with the number of times a number is divisible by a larger
number,
One with the number of times it has a smaller number divisible by
it.
You compare the two dictionaries and multiply the values for each key because the key having a 0 in either essentially means it is not the second number in a triple.
Example:
l = [1,2,3,4,5,6]
lower_counts = {1:5, 2:2, 3:1, 4:0, 5:0, 6:0}
upper_counts = {1:0, 2:1, 3:1, 4:2, 5:1, 6:3}
triple_tuple = ([1,2,4], [1,2,6], [1,3,6])
I have an array and I would like to split it two parts such that their sum is equal for example [10, 30, 20, 50] can be split into [10, 40] , [20, 30]. Both have a sum of 50. This is essentially partitioning algorithm but I'd like the retrieve the subsets not just identify whether it's partitionable. So, I went ahead and did the following:
Update: updated script to handle duplicates
from collections import Counter
def is_partitionable(a):
possible_sums = [a[0]]
corresponding_subsets = [[a[0]]]
target_value = sum(a)/2
if a[0] == target_value:
print("yes",[a[0]],a[1:])
return
for x in a[1:]:
temp_possible_sums = []
for (ind, t) in enumerate(possible_sums):
cursum = t + x
if cursum < target_value:
corresponding_subsets.append(corresponding_subsets[ind] + [x])
temp_possible_sums.append(cursum)
if cursum == target_value:
one_subset = corresponding_subsets[ind] + [x]
another_subset = list((Counter(a) - Counter(one_subset)).elements())
print("yes", one_subset,another_subset)
return
possible_sums.extend(temp_possible_sums)
print("no")
return
is_partitionable(list(map(int, input().split())))
Sample Input & Output:
>>> is_partitionable([10,30,20,40])
yes [10, 40] [30, 20]
>>> is_partitionable([10,30,20,20])
yes [10, 30] [20, 20]
>>> is_partitionable([10,30,20,10])
no
I'm essentially storing the corresponding values that were added to get a value in corresponding_subsets. But, as the size of a increases, it's obvious that the corresponding_subsets would have way too many sub-lists (equal to the number of elements in possible_sums). Is there a better/more efficient way to do this?
Though it is still a hard problem, you could try the following. I assume that there are n elements and they are stored in the array named arr ( I assume 1-based indexing ). Let us make two teams A and B, such that I want to partition the elements of arr among teams A and B such that sum of elements in both the teams is equal. Each element of arr has an option of either going to team A or team B. Say if an element ( say ith element ) goes to team A we denote it by -a[i] and if it goes to team B we let it be a[i]. Thus after assigning each element to a team, if the total sum is 0 our job is done. We will create n sets ( they do not store duplicates ). I will work with the example arr = {10,20,30,40}. Follow the following steps
set_1 = {10,-10} # -10 if it goes to Team A and 10 if goes to B
set_2 = {30,-10,10,-30} # four options as we add -20 and 20
set_3 = {60,0,20,-40,-20,-60} # note we don't need to store duplicates
set_4 = {100,20,40,-40,60,-20,-80,0,-60,-100} # see there is a zero means our task is possible
Now all you have to do is backtrack from the 0 in the last set to see if the ith element a[i] was added as a[i] or as -a[i], ie. whether it is added to Team A or B.
EDIT
The backtracking routine. So we have n sets from set_1 to set_n. Let us make two lists list_A to push the elements that belong to team A and similarly list_B. We start from set_n , thus using a variable current_set initially having value n. Also we are focusing at element 0 in the last list, thus using a variable current_element initially having value 0. Follow the approach in the code below ( I assume all sets 1 to n have been formed, for sake of ease I have stored them as list of list, but you should use set data structure ). Also the code below assumes a 0 is seen in the last list ie. our task is possible.
sets = [ [0], #see this dummy set it is important, this is set_0
#because initially we add -arr[0] or arr[0] to 0
[10,-10],
[30,-10,10,-30],
[60,0,20,-40,-20,-60],
[100,20,40,-40,60,-20,-80,0,-60,-100]]
# my array is 1 based so ignore the zero
arr = [0,10,20,30,40]
list_A = []
list_B = []
current_element = 0
current_set = 4 # Total number of sets in this case is n=4
while current_set >= 1:
print current_set,current_element
for element in sets[current_set-1]:
if element + arr[current_set] == current_element:
list_B.append(arr[current_set])
current_element = element
current_set -= 1
break
elif element - arr[current_set] == current_element:
list_A.append(arr[current_set])
current_element = element
current_set -= 1
break
print list_A,list_B
This is my implementation of #sasha's algo on the feasibility.
def my_part(my_list):
item = my_list.pop()
balance = []
temp = [item, -item]
while len(my_list) != 0:
new_player = my_list.pop()
for i, items in enumerate(temp):
balance.append(items + new_player)
balance.append(items - new_player)
temp = balance[:]
balance = set(balance)
if 0 in balance:
return 'YES'
else:
return 'NO'
I am working on the backtracking too.
Hi I've been reading up on finding the minimum of a multidimensional list, but if I have an N x N x 4 list, how do I get the minimum between every single 4th element? All other examples have been for a small example list using real indices. I suppose I'll be needing to define indices in terms of N....
[[[0,1,2,3],[0,1,2,3],...N],[[0,1,2,3],[0,1,2,3],...N].....N]
And then there's retrieving their indices.
I don't know what to try.
If anyone's interested in the actual piece of code:
relative = [[[[100] for k in range(5)] for j in range(N)] for i in range(N)]
What the following does is fill in the 4th element with times satisfying the mathematical equations. The 0th, 1st, 2nd and 3rd elements of relative have positions and velocities. The 4th spot is for the time taken for the i and jth particles to collide (redundant values such as i-i or j-i are filled with the value 100 (because it's big enough for the min function not to retrieve it). I need the shortest collision time (hence the 4th element comparisons)
def time(relative):
i = 0
t = 0
while i<N:
j = i+1
while j<N and i<N:
rv = relative[i][j][0]*relative[i][j][2]+relative[i][j][1]*relative[i][j][3] #Dot product of r and v
if rv<0:
rsquared = (relative[i][j][0])**2+(relative[i][j][1])**2
vsquared = (relative[i][j][2])**2+(relative[i][j][3])**2
det = (rv)**2-vsquared*(rsquared-diameter**2)
if det<0:
t = 100 #For negative times, assign an arbitrarily large number to make sure min() wont pick it up.
elif det == 0:
t = -rv/vsquared
elif det>0:
t1 = (-rv+sqrt((rv)**2-vsquared*(rsquared-diameter**2)))/(vsquared)
t2 = (-rv-sqrt((rv)**2-vsquared*(rsquared-diameter**2)))/(vsquared)
if t1-t2>0:
t = t2
elif t1-t2<0:
t = t1
elif rv>=0:
t = 100
relative[i][j][4]=t #Put the times inside the relative list for element ij.
j = j+1
i = i+1
return relative
I've tried:
t_fin = min(relative[i in range(0,N-1)][j in range(0,N-1)][4])
Which compiles but always returns 100 even thought I've checked it isnt the smallest element.
If you want the min of 4th element of NxNx4 list,
min([x[3] for lev1 in relative for x in lev1])
I have a list of 40 elements. I am trying to estimate how many times I need to sample this list in order to reproduce all elements in that list. However, it is important that I replace the picked element. I.e. it is possible that I will pick the same element 20 times. So far I have the following
import random
l = range(0,40)
seen=[]
x=0
while len(seen)<len(l):
r = random.choice(l)
if r not in seen:
seen.append(r)
x=x+1
print x
However, this always returns that it took 40 times to accomplish what I want. However, this is because a single element is never selected twice.
Eventually I would run this function 1000 times to get a feel for how often I would have to sample.
as always, thanks
You need just adjust the indentation of x=x+1. Because right now you just increment if the value was not seen before.
If you will do that more often with a lot of items may use a set as your seen variable because access items is faster in avarage.
l = range(0, 40)
seen = set()
x = 0
while len(seen) < len(l):
r = random.choice(l)
if r not in seen:
seen.add(r)
x = x + 1
print x
Here is a similar method to do it. Initialize a set, which by definition may only contain unique elements (no duplicates). Then keep using random.choice() to choose an element from your list. You can compare your set to the original list, and until they are the same size, you don't have every element. Keep a counter to see how many random choices it takes.
import random
def sampleValues(l):
counter = 0
values = set()
while len(values) < len(l):
values.add(random.choice(l))
counter += 1
return counter
>>> l = list(range(40))
This number will vary, you could Monte Carlo to get some stats
>>> sampleValues(l)
180
>>> sampleValues(l)
334
>>> sampleValues(l)
179
I am working with python 3.2 and I spent a lot of time trouble shooting this, and I still can't seem to wrap my brain around it.
number = random.randint ( x0 ,xn )
I'm generating a random number. It's purpose is to make my code come at me differently everytime.
For example I have 10 variables of text that I have written. I have solved the problem of not having these variables appear in the same order at each program run.
The issue I have is that they now appear randomly everytime. It picks one out of 10 everytime, instead the first time 10 and next 9. I can't seem to find out how to exclude the previous ones.
thelist = [0]
while i < x
if number in thelist:
>>>repeat<<<
else:
thelist.append (number)
if ( number == x0 ):
>>>something<<<
elif ( number == x1 ):
>>>something<<<
This is what I would imagine the code would look like, everytime you loop one more number gets appended to the list, so that everytime it picks a number already in the list it repeats the loop again until it then has used all the numbers that random.randint can pull.
Here's a shuffle function:
import random
max = 15
x = list(range(max+1))
for i in range(max, 0, -1):
n = random.randint(0, i)
x[n], x[i] = x[i], x[n]
This starts with a sorted list of numbers [0, 1, ... max].
Then, it chooses a number from index 0 to index max, and swaps it with index max.
Then, it chooses a number from index 0 to index max-1, and swaps it with index max-1.
And so on, for max-2, max-3, ... 1
As yosukesabai rightly notes, this has the same effect as calling random.sample(range(max+1), max+1). This picks max + 1 unique random values from range(max+1). In other words, it just shuffles the order around. Docs: http://docs.python.org/2/library/random.html#random.sample
If you wanted something more along the lines of your proposed algorithm, you could do:
import random
max = 15
x = range(max+1)
l = []
for _ in range(max+1):
n = random.randint(0,max)
while n in l:
n = random.randint(0,max)
l.append(n)
From what I understand of your description and sample code, you want thelist to end up with every integer between x0 and xn in a random order. If so, you can achieve that very simply with random.shuffle(), which shuffles a list in place:
import random
x0 = 5
xn = 15
full_range = list(range(x0, xn))
print(full_range)
random.shuffle(full_range)
print(full_range)