Combinations of expressions with 4 elementary operations - python

I could not come up with a better title, for an adequate one might require the whole explanation. Also, combinations could be misleading since the problem will involve permutations.
What I want to accomplish is to outperform a brute force approach in Python at the following problem: Given the 4 elementary operations [+,-,*,/] and the digits from 1 to 9, and given all the possible combinations of 5 digits and the 4 operations without repetition (permutations) that result in a given number (treated as an integer), as in 1+5*9-3/7=45, 1-2/3+9*5=45,... obtain all the integers from the lowest possible value to the highest possible value and find out wether all the integers in the space expanse exist.
My preliminary attempt with brute force is the following:
def brute_force(target):
temp = 0
x = [i for i in range(1,10)]
numbers = [str(i) for i in x]
operators = ["+","-","*","/"]
for values in permutations(numbers,5):
for oper in permutations(operators):
formula = "".join(o + v for o,v in zip([""]+list(oper),values))
if round(eval(formula)) == int(target):
temp += 1
if temp > 0:
return True
else:
return False
for i in range(-100,100):
total = brute_force(i)
if total:
print(i)
else:
print(str(i) + 'No')
It just prints 'No' besides the integers that have not been found. As may seem obvious, all integer values can be found in the space expanse, ranging between -71 to 79.
I am sort of a newcommer both with Python and with algorithmic implementation, but I think that the algorithm has complexity O(n!), judging by the fact that permutations are involved. But if that is not the case I nevertheless want an algorithm that performs better (such as recursion or dynamic programming).

Let's compute the set of possible results only once (and in a bit simpler and faster way):
expression = [None] * 9
results = {eval(''.join(expression))
for expression[::2] in permutations('123456789', 5)
for expression[1::2] in permutations('+-*/')}
It computes all possible results in about 4.5 seconds on my laptop. Yours rewritten like this takes about 5.5 seconds. Both of which are much faster than your way of redoing all calculations for every target integer.
Using that results set, we can then answer questions instantaneously, confirming your range and showing that only -70 and 78 are missing:
>>> min(results), max(results)
(-70.71428571428571, 78.83333333333333)
>>> set(range(-70, 79)) - results
{-70, 78}

First of all, let's look at the expression analytically. You have three terms: a product P (A*B), a quotient Q (A/B), and a scalar S. You combine these with an addition and a subtraction.
Two of the terms are positive; the other is negative, so we can simply negate one of the three terms (P, Q, S) and take the sum. This cuts down the combinatorics.
Multiplication is commutative; w.l.o.g, we can assume A>B, which cuts the permutations in half.
Here are my suggestion for first efficiency:
First choose the terms of the product with A>B; 36 combinations
Then choose S from the remaining 7 digits; 7*36=252 combinations
From the last 6 digits, the possible quotients range from less-than-1 through max_digit / min_digit. Group these into equivalence classes (one set for addition, one for subtraction), rather than running through all 30 permutations. This gives us roughly 6 values per case; we now have ~1500 combinations of three terms.
For each of these combinations, we have 3 possible choices for which one to negate; total is ~4500 sums.
Is that enough improvement for a start?
Thanks to Heap Overflow for pointing out the data flow case I missed (this is professionally embarrassing :-) ).
The case A*B/C+D-E is not covered above. The approach is comparable.
First choose the terms of the product with A>B; 36 combinations
Then choose C from the remaining 7 digits; 7*36=252 combinations
There are only 38 total possible quotients; you can generate these as you wish, but with so few combinations, brute-force is also reasonable.
From the last 6 digits, you have 30 combinations, but half of them are negations of the other half. Choose D>E to start and merely make a second pass for the negative ones. Don't bother to check for duplicated differences; it's not worth the time.
You now have less than 38 quotients to combine with a quantity of differences (min 5, max 8, mean almost 7).
As it happens, a bit of examination of the larger cases (quotients with divisor of 1) and the remaining variety of digits will demonstrate that this method will cover all of the integers in the range -8 through 77, inclusive. You cannot remove 3 large numbers from the original 9 digits without leaving numbers whose difference omits needed intervals.
If you're allowed to include that analysis in your coding, you can shorten this part by reversing the search. You demonstrate the coverage for the large cases {48, 54, 56, 63, 72}, demonstrate the gap-filling for smaller quotients, and then you can search with less complication for the cases in my original posting, enjoying the knowledge that you need only 78, 79, and numbers less than -8.

I think you just need to find the permutations ONCE. Create a set out of all the possible sums. And then just do a lookup. Still sort of brute force but saves you a lot of repeated calculations.
def find_all_combinations():
x = [i for i in range(1,10)]
output_set = set()
numbers = [str(i) for i in x]
operators = ["+","-","*","/"]
print("Starting Calculations", end="")
for values in permutations(numbers,5):
for oper in permutations(operators):
formula = "".join(o + v for o,v in zip([""]+list(oper),values))
# Add all the possible outputs to a set
output_set.add(round(eval(formula)))
print(".", end="")
return output_set
output = find_all_combinations()
for i in range(-100,100):
if i in output:
print(i)
else:
print(str(i) + 'No')

Related

Most efficient way to cross compare millions of hash values in a list

I have a list of 9 million hash values. I need to compare each value(hash0) in the list with the rest of the values :
for i, hash0 in enumerate(hashes_list):
for hash1 in hashes_list[i:]:
if hash0 -hash1 < threshold:
#do something
This solution above is of quadratic complexity, which is taking forever to run (even in a server). What is an efficient way to cross-match these 9 million hashes?
Here is a sample of the hashes_list values :
8c59ac5169e673a6
ab9f545497b05683
9590ee98373e1e19
c1274a5e1e150e7f
938f7c782dc6241b
Assuming the subtraction is just a regular subtraction, try sorting first, Sorts can be O(n Ln(n)) time complexity which is a little better than n^2
That way you could iterate once with two pointers finding groups of hashes that are all close to each other. This would be n*k complexity with n being the number of hashes and k being the average number that match.
The pseudo code would look something like
sort(hashes_list) #large to small
count = size(hashes_list)
i = 0
while i < count:
j = i + 1
while hashes_list[i] - hashes_list[j] < threshold:
#do something
j += 1
i += 1
you might be able to skip the check in some cases. For example where 0 - 10 all are within the threshold, then 1-10 would also be and the "#do something" would just have to be called for each without another check
Since you don't want to compare exact matches of your values, that rules out easily using sets or dicts -
but you can certainly benefit from using a better data structure that could be more fit for the purpose.
If the value comparison you need is numeric, as it seems in your code, it looks like simply sorting the list (and sorting 9 million values is quite feasible), and comparing the neighbors in the result could suffice to reduce your complexity from O(n**2) to O(n).

how to calculate the minimum unfairness sum of a list

I have tried to summarize the problem statement something like this::
Given n, k and an array(a list) arr where n = len(arr) and k is an integer in set (1, n) inclusive.
For an array (or list) myList, The Unfairness Sum is defined as the sum of the absolute differences between all possible pairs (combinations with 2 elements each) in myList.
To explain: if mylist = [1, 2, 5, 5, 6] then Minimum unfairness sum or MUS. Please note that elements are considered unique by their index in list not their values
MUS = |1-2| + |1-5| + |1-5| + |1-6| + |2-5| + |2-5| + |2-6| + |5-5| + |5-6| + |5-6|
If you actually need to look at the problem statement, It's HERE
My Objective
given n, k, arr(as described above), find the Minimum Unfairness Sum out of all of the unfairness sums of sub arrays possible with a constraint that each len(sub array) = k [which is a good thing to make our lives easy, I believe :) ]
what I have tried
well, there is a lot to be added in here, so I'll try to be as short as I can.
My First approach was this where i used itertools.combinations to get all the possible combinations and statistics.variance to check its spread of data (yeah, I know I'm a mess).
Before you see the code below, Do you think these variance and unfairness sum are perfectly related (i know they are strongly related) i.e. the sub array with minimum variance has to be the sub array with MUS??
You only have to check the LetMeDoIt(n, k, arr) function. If you need MCVE, check the second code snippet below.
from itertools import combinations as cmb
from statistics import variance as varn
def LetMeDoIt(n, k, arr):
v = []
s = []
subs = [list(x) for x in list(cmb(arr, k))] # getting all sub arrays from arr in a list
i = 0
for sub in subs:
if i != 0:
var = varn(sub) # the variance thingy
if float(var) < float(min(v)):
v.remove(v[0])
v.append(var)
s.remove(s[0])
s.append(sub)
else:
pass
elif i == 0:
var = varn(sub)
v.append(var)
s.append(sub)
i = 1
final = []
f = list(cmb(s[0], 2)) # getting list of all pairs (after determining sub array with least MUS)
for r in f:
final.append(abs(r[0]-r[1])) # calculating the MUS in my messy way
return sum(final)
The above code works fine for n<30 but raised a MemoryError beyond that.
In Python chat, Kevin suggested me to try generator which is memory efficient (it really is), but as generator also generates those combination on the fly as we iterate over them, it was supposed to take over 140 hours (:/) for n=50, k=8 as estimated.
I posted the same as a question on SO HERE (you might wanna have a look to understand me properly - it has discussions and an answer by fusion which takes me to my second approach - a better one(i should say fusion's approach xD)).
Second Approach
from itertools import combinations as cmb
def myvar(arr): # a function to calculate variance
l = len(arr)
m = sum(arr)/l
return sum((i-m)**2 for i in arr)/l
def LetMeDoIt(n, k, arr):
sorted_list = sorted(arr) # i think sorting the array makes it easy to get the sub array with MUS quickly
variance = None
min_variance_sub = None
for i in range(n - k + 1):
sub = sorted_list[i:i+k]
var = myvar(sub)
if variance is None or var<variance:
variance = var
min_variance_sub=sub
final = []
f = list(cmb(min_variance_sub, 2)) # again getting all possible pairs in my messy way
for r in f:
final.append(abs(r[0] - r[1]))
return sum(final)
def MainApp():
n = int(input())
k = int(input())
arr = list(int(input()) for _ in range(n))
result = LetMeDoIt(n, k, arr)
print(result)
if __name__ == '__main__':
MainApp()
This code works perfect for n up to 1000 (maybe more), but terminates due to time out (5 seconds is the limit on online judge :/ ) for n beyond 10000 (the biggest test case has n=100000).
=====
How would you approach this problem to take care of all the test cases in given time limits (5 sec) ? (problem was listed under algorithm & dynamic programming)
(for your references you can have a look on
successful submissions(py3, py2, C++, java) on this problem by other candidates - so that you can
explain that approach for me and future visitors)
an editorial by the problem setter explaining how to approach the question
a solution code by problem setter himself (py2, C++).
Input data (test cases) and expected output
Edit1 ::
For future visitors of this question, the conclusions I have till now are,
that variance and unfairness sum are not perfectly related (they are strongly related) which implies that among a lots of lists of integers, a list with minimum variance doesn't always have to be the list with minimum unfairness sum. If you want to know why, I actually asked that as a separate question on math stack exchange HERE where one of the mathematicians proved it for me xD (and it's worth taking a look, 'cause it was unexpected)
As far as the question is concerned overall, you can read answers by archer & Attersson below (still trying to figure out a naive approach to carry this out - it shouldn't be far by now though)
Thank you for any help or suggestions :)
You must work on your list SORTED and check only sublists with consecutive elements. This is because BY DEFAULT, any sublist that includes at least one element that is not consecutive, will have higher unfairness sum.
For example if the list is
[1,3,7,10,20,35,100,250,2000,5000] and you want to check for sublists with length 3, then solution must be one of [1,3,7] [3,7,10] [7,10,20] etc
Any other sublist eg [1,3,10] will have higher unfairness sum because 10>7 therefore all its differences with rest of elements will be larger than 7
The same for [1,7,10] (non consecutive on the left side) as 1<3
Given that, you only have to check for consecutive sublists of length k which reduces the execution time significantly
Regarding coding, something like this should work:
def myvar(array):
return sum([abs(i[0]-i[1]) for i in itertools.combinations(array,2)])
def minsum(n, k, arr):
res=1000000000000000000000 #alternatively make it equal with first subarray
for i in range(n-k):
res=min(res, myvar(l[i:i+k]))
return res
I see this question still has no complete answer. I will write a track of a correct algorithm which will pass the judge. I will not write the code in order to respect the purpose of the Hackerrank challenge. Since we have working solutions.
The original array must be sorted. This has a complexity of O(NlogN)
At this point you can check consecutive sub arrays as non-consecutive ones will result in a worse (or equal, but not better) "unfairness sum". This is also explained in archer's answer
The last check passage, to find the minimum "unfairness sum" can be done in O(N). You need to calculate the US for every consecutive k-long subarray. The mistake is recalculating this for every step, done in O(k), which brings the complexity of this passage to O(k*N). It can be done in O(1) as the editorial you posted shows, including mathematic formulae. It requires a previous initialization of a cumulative array after step 1 (done in O(N) with space complexity O(N) too).
It works but terminates due to time out for n<=10000.
(from comments on archer's question)
To explain step 3, think about k = 100. You are scrolling the N-long array and the first iteration, you must calculate the US for the sub array from element 0 to 99 as usual, requiring 100 passages. The next step needs you to calculate the same for a sub array that only differs from the previous by 1 element 1 to 100. Then 2 to 101, etc.
If it helps, think of it like a snake. One block is removed and one is added.
There is no need to perform the whole O(k) scrolling. Just figure the maths as explained in the editorial and you will do it in O(1).
So the final complexity will asymptotically be O(NlogN) due to the first sort.

Calculating the nth Fibonacci number with a sum of factorials?

During my efforts to solve leetcode 70, climbing stairs, I thought of an answer that uses factorials (how many different ways to form a line with identical elements). Later, I thought that the question could also be solved using fibonacci, and the first few answers appear to be the same.
My questions are:
Is there a formal mathematical proof (or an intuitive explanation) as to why these two might be the same? or if they are the same?
the climbing stairs problem is a problem that asks you to find the number of ways a person only taking one or two steps can climb up a stair with n stairs.
Since this is just a way of saying how many different ways can you line up 1 and 2 that sum up to n. I solved the questions that way.
max_one, and max_two are maximum numbers of 1s or 2s that can exist and I iterated over how many 2s could be in the line, finding how every different line can be formed given number of 2s (represented as i)
the numerator is how many elements in line, the denominator is how many identical elements ( how many 1s, how many 2s) are in the line:
from math import factorial as fac
class Solution:
def climbStairs(self, n):
if n == 1: return 1
if n == 2: return 2
max_two = n//2
max_one = n
ways = 0
for i in range(0,max_two+1):
ways += fac(max_one-i)/(fac(max_one-2*i) * fac(i))
return int(ways)
Congratulations! You’ve rediscovered a very cool identity involving the Fibonacci numbers. Specifically, the rule you’ve found is what’s visually demonstrated in this picture, where adding up terms of Pascal’s triangle along these skew diagonals gives back the Fibonacci sequence:
To see why this is, notice that each term in your summation is equal to (n - i) choose i, which is the entry in row (n - i), column i of Pascal’s triangle. Each term in the summation corresponds to moving up one row and over one column, hence the angle on those lines.
As for a proof, the argument that you’ve outlined is essentially the basis of a double-counting argument. We know that the Fibonacci numbers count the number of ways to walk down a flight of stairs taking steps of sizes one and two, which we can rigorously prove by induction. Additionally, we know that your summation counts the same quantity, as you’re enumerating all the ways to take paths down using 0, 1, 2, 3, ..., etc. steps of size two. Since we’re counting the same quantity in two ways, these two expressions must be equal.
If that doesn’t suffice, you can formalize this using a proof by induction, using the identity (n choose k) = (n-1 choose k) + (n-1 choose k-1) and splitting into cases where n is even and where n is odd.
Hope this helps!

Best way to hash ordered permutation of [1,9]

I'm trying to implement a method to keep the visited states of the 8 puzzle from generating again.
My initial approach was to save each visited pattern in a list and do a linear check each time the algorithm wants to generate a child.
Now I want to do this in O(1) time through list access. Each pattern in 8 puzzle is an ordered permutation of numbers between 1 to 9 (9 being the blank block), for example 125346987 is:
1 2 5
3 4 6
_ 8 7
The number of all of the possible permutation of this kind is around 363,000 (9!). what is the best way to hash these numbers to indexes of a list of that size?
You can map a permutation of N items to its index in the list of all permutations of N items (ordered lexicographically).
Here's some code that does this, and a demonstration that it produces indexes 0 to 23 once each for all permutations of a 4-letter sequence.
import itertools
def fact(n):
r = 1
for i in xrange(n):
r *= i + 1
return r
def I(perm):
if len(perm) == 1:
return 0
return sum(p < perm[0] for p in perm) * fact(len(perm) - 1) + I(perm[1:])
for p in itertools.permutations('abcd'):
print p, I(p)
The best way to understand the code is to prove its correctness. For an array of length n, there's (n-1)! permutations with the smallest element of the array appearing first, (n-1)! permutations with the second smallest element appearing first, and so on.
So, to find the index of a given permutation, see count how many items are smaller than the first thing in the permutation and multiply that by (n-1)!. Then recursively add the index of the remainder of the permutation, considered as a permutation of (n-1) elements. The base case is when you have a permutation of length 1. Obviously there's only one such permutation, so its index is 0.
A worked example: [1324].
[1324]: 1 appears first, and that's the smallest element in the array, so that gives 0 * (3!)
Removing 1 gives us [324]. The first element is 3. There's one element that's smaller, so that gives us 1 * (2!).
Removing 3 gives us [24]. The first element is 2. That's the smallest element remaining, so that gives us 0 * (1!).
Removing 2 gives us [4]. There's only one element, so we use the base case and get 0.
Adding up, we get 0*3! + 1*2! + 0*1! + 0 = 1*2! = 2. So [1324] is at index 2 in the sorted list of 4 permutations. That's correct, because at index 0 is [1234], index 1 is [1243], and the lexicographically next permutation is our [1324].
I believe you're asking for a function to map permutations to array indices. This dictionary maps all permutations of numbers 1-9 to values from 0 to 9!-1.
import itertools
index = itertools.count(0)
permutations = itertools.permutations(range(1, 10))
hashes = {h:next(index) for h in permutations}
For example, hashes[(1,2,5,3,4,6,9,8,7)] gives a value of 1445.
If you need them in strings instead of tuples, use:
permutations = [''.join(x) for x in itertools.permutations('123456789')]
or as integers:
permutations = [int(''.join(x)) for x in itertools.permutations('123456789')]
It looks like you are only interested in whether or not you have already visited the permutation.
You should use a set. It grants the O(1) look-up you are interested in.
A space as well lookup efficient structure for this problem is a trie type structure, as it will use common space for lexicographical matches in any
permutation.
i.e. the space used for "123" in 1234, and in 1235 will be the same.
Lets assume 0 as replacement for '_' in your example for simplicity.
Storing
Your trie will be a tree of booleans, the root node will be an empty node, and then each node will contain 9 children with a boolean flag set to false, the 9 children specify digits 0 to 8 and _ .
You can create the trie on the go, as you encounter a permutation, and store the encountered digits as boolean in the trie by setting the bool as true.
Lookup
The trie is traversed from root to children based on digits of the permutation, and if the nodes have been marked as true, that means the permutation has occured before. The complexity of lookup is just 9 node hops.
Here is how the trie would look for a 4 digit example :
Python trie
This trie can be easily stored in a list of booleans, say myList.
Where myList[0] is the root, as explained in the concept here :
https://webdocs.cs.ualberta.ca/~holte/T26/tree-as-array.html
The final trie in a list would be around 9+9^2+9^3....9^8 bits i.e. less than 10 MB for all lookups.
Use
I've developed a heuristic function for this specific case. It is not a perfect hashing, as the mapping is not between [0,9!-1] but between [1,767359], but it is O(1).
Let's assume we already have a file / reserved memory / whatever with 767359 bits set to 0 (e.g., mem = [False] * 767359). Let a 8puzzle pattern be mapped to a python string (e.g., '125346987'). Then, the hash function is determined by:
def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
k = opts.index(int(c))
opts.pop(k)
data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output
I.e., in order to check if a 8puzzle pattern = 125346987 has already been used, we need to:
pattern = '125346987'
pos = getPosition(pattern)
used = mem[pos-1] #mem starts in 0, getPosition in 1.
With a perfect hashing we would have needed 9! bits to store the booleans. In this case we need 2x more (767359/9! = 2.11), but recall that it is not even 1Mb (barely 100KB).
Note that the function is easily invertible.
Check
I could prove you mathematically why this works and why there won't be any collision, but since this is a programming forum let's just run it for every possible permutation and check that all the hash values (positions) are indeed different:
def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
k = opts.index(int(c))
opts.pop(k)
data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output
#CHECKING PURPOSES
def addperm(x,l):
return [ l[0:i] + [x] + l[i:] for i in range(len(l)+1) ]
def perm(l):
if len(l) == 0:
return [[]]
return [x for y in perm(l[1:]) for x in addperm(l[0],y) ]
#We generate all the permutations
all_perms = perm([ i for i in range(1,10)])
print "Number of all possible perms.: "+str(len(all_perms)) #indeed 9! = 362880
#We execute our hash function over all the perms and store the output.
all_positions = [];
for permutation in all_perms:
perm_string = ''.join(map(str,permutation))
all_positions.append(getPosition(perm_string))
#We wan't to check if there has been any collision, i.e., if there
#is one position that is repeated at least twice.
print "Number of different hashes: "+str(len(set(all_positions)))
#also 9!, so the hash works properly.
How does it work?
The idea behind this has to do with a tree: at the beginning it has 9 branches going to 9 nodes, each corresponding to a digit. From each of these nodes we have 8 branches going to 8 nodes, each corresponding to a digit except its parent, then 7, and so on.
We first store the first digit of our input string in a separate variable and pop it out from our 'node' list, because we have already taken the branch corresponding to the first digit.
Then we have 8 branches, we choose the one corresponding with our second digit. Note that, since there are 8 branches, we need 3 bits to store the index of our chosen branch and the maximum value it can take is 111 for the 8th branch (we map branch 1-8 to binary 000-111). Once we have chosen and store the branch index, we pop that value out, so that the next node list doesn't include again this digit.
We proceed in the same way for branches 7, 6 and 5. Note that when we have 7 branches we still need 3 bits, though the maximum value will be 110. When we have 5 branches, the index will be at most binary 100.
Then we get to 4 branches and we notice that this can be stored just with 2 bits, same for 3 branches. For 2 branches we will just need 1bit, and for the last branch we don't need any bit: there will be just one branch pointing to the last digit, which will be the remaining from our 1-9 original list.
So, what we have so far: the first digit stored in a separated variable and a list of 7 indexes representing branches. The first 4 indexes can be represented with 3bits, the following 2 indexes can be represented with 2bits and the last index with 1bit.
The idea is to concatenate all this indexes in their bit form to create a larger number. Since we have 17bits, this number will be at most 2^17=131072. Now we just add the first digit we had stored to the end of that number (at most this digit will be 9) and we have that the biggest number we can create is 1310729.
But we can do better: recall that when we had 5 branches we needed 3 bits, though the maximum value was binary 100. What if we arrange our bits so that those with more 0s come first? If so, in the worst case scenario our final bit number will be the concatenation of:
100 10 101 110 111 11 1
Which in decimal is 76735. Then we proceed as before (adding the 9 at the end) and we get that our biggest possible generated number is 767359, which is the ammount of bits we need and corresponds to input string 987654321, while the lowest possible number is 1 which corresponds to input string 123456789.
Just to finish: one might wonder why have we stored the first digit in a separate variable and added it at the end. The reason is that if we had kept it then the number of branches at the beginning would have been 9, so for storing the first index (1-9) we would have needed 4 bits (0000 to 1000). which would have make our mapping much less efficient, as in that case the biggest possible number (and therefore the amount of memory needed) would have been
1000 100 10 101 110 111 11 1
which is 1125311 in decimal (1.13Mb vs 768Kb). It is quite interesting to see that the ratio 1.13M/0.768K = 1.47 has something to do with the ratio of the four bits compared to just adding a decimal value (2^4/10 = 1.6) which makes a lot of sense (the difference is due to the fact that with the first approach we are not fully using the 4 bits).
First. There is nothing faster than a list of booleans. There's a total of 9! == 362880 possible permutations for your task, which is a reasonably small amount of data to store in memory:
visited_states = [False] * math.factorial(9)
Alternatively, you can use array of bytes which is slightly slower (not by much though) and has a much lower memory footprint (by a power of magnitude at least). However any memory savings from using an array will probably be of little value considering the next step.
Second. You need to convert your specific permutation to it's index. There are algorithms which do this, one of the best StackOverflow questions on this topic is probably this one:
Finding the index of a given permutation
You have fixed permutation size n == 9, so whatever complexity an algorithm has, it will be equivalent to O(1) in your situation.
However to produce even faster results, you can pre-populate a mapping dictionary which will give you an O(1) lookup:
all_permutations = map(lambda p: ''.join(p), itertools.permutations('123456789'))
permutation_index = dict((perm, index) for index, perm in enumerate(all_permutations))
This dictionary will consume about 50 Mb of memory, which is... not that much actually. Especially since you only need to create it once.
After all this is done, checking your specific combination is done with:
visited = visited_states[permutation_index['168249357']]
Marking it to visited is done in the same manner:
visited_states[permutation_index['168249357']] = True
Note that using any of permutation index algorithms will be much slower than mapping dictionary. Most of those algorithms are of O(n2) complexity and in your case it results 81 times worse performance even discounting the extra python code itself. So unless you have heavy memory constraints, using mapping dictionary is probably the best solution speed-wise.
Addendum. As has been pointed out by Palec, visited_states list is actually not needed at all - it's perfectly possible to store True/False values directly in the permutation_index dictionary, which saves some memory and an extra list lookup.
Notice if you type hash(125346987) it returns 125346987. That is for a reason, because there is no point in hashing an integer to anything other than an integer.
What you should do, is when you find a pattern add it to a dictionary rather than a list. This will provide the fast lookup you need rather than traversing the list like you are doing now.
So say you find the pattern 125346987 you can do:
foundPatterns = {}
#some code to find the pattern
foundPatterns[1] = 125346987
#more code
#test if there?
125346987 in foundPatterns.values()
True
If you must always have O(1), then seems like a bit array would do the job. You'd only need to store 363,000 elements, which seems doable. Though note that in practice it's not always faster. Simplest implementation looks like:
Create data structure
visited_bitset = [False for _ in xrange(373000)]
Test current state and add if not visited yet
if !visited[current_state]:
visited_bitset[current_state] = True
Paul's answer might work.
Elisha's answer is perfectly valid hash function that would guarantee that no collision happen in the hash function. The 9! would be a pure minimum for a guaranteed no collision hash function, but (unless someone corrects me, Paul probably has) I don't believe there exists a function to map each board to a value in the domain [0, 9!], let alone a hash function that is nothing more that O(1).
If you have a 1GB of memory to support a Boolean array of 864197532 (aka 987654321-12346789) indices. You guarantee (computationally) the O(1) requirement.
Practically (meaning when you run in a real system) speaking this isn't going to be cache friendly but on paper this solution will definitely work. Even if an perfect function did exist, doubt it too would be cache friendly either.
Using prebuilts like set or hashmap (sorry I haven't programmed Python in a while, so don't remember the datatype) must have an amortized 0(1). But using one of these with a suboptimal hash function like n % RANDOM_PRIME_NUM_GREATER_THAN_100000 might give the best solution.

Can Python generate a random number that excludes a set of numbers, without using recursion?

I looked over Python Docs (I may have misunderstood), but I didn't see that there was a way to do this (look below) without calling a recursive function.
What I'd like to do is generate a random value which excludes values in the middle.
In other words,
Let's imagine I wanted X to be a random number that's not in
range(a - b, a + b)
Can I do this on the first pass,
or
1. Do I have to constantly generate a number,
2. Check if in range(),
3. Wash rinse ?
As for why I don't wish to write a recursive function,
1. it 'feels like' I should not have to
2. the set of numbers I'm doing this for could actually end up being quite large, and
... I hear stack overflows are bad, and I might just be being overly cautious in doing this.
I'm sure that there's a nice, Pythonic, non-recursive way to do it.
Generate one random number and map it onto your desired ranges of numbers.
If you wanted to generate an integer between 1-4 or 7-10, excluding 5 and 6, you might:
Generate a random integer in the range 1-8
If the random number is greater than 4, add 2 to the result.
The mapping becomes:
Random number: 1 2 3 4 5 6 7 8
Result: 1 2 3 4 7 8 9 10
Doing it this way, you never need to "re-roll". The above example is for integers, but it can also be applied to floats.
Use random.choice().
In this example, a is your lower bound, the range between b and c is skipped and d is your upper bound.
import random
numbers = range(a,b) + range(c,d)
r = random.choice(numbers)
A possible solution would be to just shift the random numbers out of that range. E.g.
def NormalWORange(a, b, sigma):
r = random.normalvariate(a,sigma)
if r < a:
return r-b
else:
return r+b
That would generate a normal distribution with a hole in the range (a-b,a+b).
Edit: If you want integers then you will need a little bit more work. If you want integers that are in the range [c,a-b] or [a+b,d] then the following should do the trick.
def RangeWORange(a, b, c, d):
r = random.randrange(c,d-2*b) # 2*b because two intervals of length b to exclude
if r >= a-b:
return r+2*b
else:
return r
I may have misunderstood your problem, but you can implement this without recursion
def rand(exclude):
r = None
while r in exclude or r is None:
r = random.randrange(1,10)
return r
rand([1,3,9])
though, you're still looping over results until you find new ones.
The fastest solution would be this (with a and b defining the exclusion zone and c and d the set of good answers including the exclusion zone):
offset = b - a
maximum = d - offset
result = random.randrange(c, maximum)
if result >= a:
result += offset
You still need some range, i.e., a min-max possible value excluding your middle values.
Why don't you first randomly pick which "half" of the range you want, then pick a random number in that range? E.g.:
def rand_not_in_range(a,b):
rangechoices = ((0,a-b-1),(a+b+1, 10000000))
# Pick a half
fromrange = random.choice(rangechoices)
# return int from that range
return random.randint(*fromrange)
Li-aung Yip's answer makes the recursion issue moot, but I have to point out that it's possible to do any degree of recursion without worrying about the stack. It's called "tail recursion". Python doesn't support tail recursion directly, because GvR thinks it's uncool:
http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html
But you can get around this:
http://paulbutler.org/archives/tail-recursion-in-python/
I find it interesting that stick thinks that recursion "feels wrong". In extremely function-oriented languages, such as Scheme, recursion is unavoidable. It allows you to do iteration without creating state variables, which the functional programming paradigm rigorously avoids.
http://www.pling.org.uk/cs/pop.html

Categories

Resources