Sometimes my set comes out ordered and sometimes not (Python) - python

So I know that a set is supposed to be an unordered list. I am trying to do some coding of my own and ended up with a weird happening. My set will sometimes go in order from 1 - 100 (when using a larger number) and when I use a smaller number it will stay unordered. Why is that?
#Steps:
#1) Take a number value for total random numbers in 1-100
#2) Put those numbers into a set (which will remove duplicates)
#3) Print that set and the total number of random numbers
import random
randomnums = 0
Min = int(1)
Max = int(100)
print('How many random numbers would you like?')
numsneeded = int(input('Please enter a number. '))
print("\n" * 25)
s = set()
while (randomnums < numsneeded):
number = random.randint(Min, Max)
s.add(number)
randomnums = randomnums + 1
print s
print len(s)
If anyone has any pointers on cleaning up my code I am 100% willing to learn. Thank you for your time!

When the documentation for set says it is an unordered collection, it only means that you can assume no specific order on the elements of the set. The set can choose what internal representation it uses to hold the data, and when you ask for the elements, they might come back in any order at all. The fact that they are sorted in some cases might mean that the set has chosen to store your elements in a sorted manner.
The set can make tradeoff decisions between performance and space depending on factors such as the number of elements in the set. For example, it could store small sets in a list, but larger sets in a tree. The most natural way to retrieve elements from a tree is in sorted order, so that's what could be happening for you.
See also Can Python's set absence of ordering be considered random order? for further info about this.

Sets are implemented with a hash implementation. The hash of an integer is just the integer. To determine where to put the number in the table the remainder of the integer when divided by the table size is used. The table starts with a size of 8, so the numbers 0 to 7 would be placed in their own slot in order, but 8 would be placed in the 0 slot. If you add the numbers 1 to 4 and 8 into an empty set it will display as:
set([8,1,2,3,4])
What happens when 5 is added is that the table has exceeded 2/3rds full. At that point the table is increased in size to 32. When creating the new table the existing table is repopulated into the new table. Now it displays as:
set([1,2,3,4,5,8])
In your example as long as you've added enough entries to cause the table to have 128 entries, then they will all be placed in the table in their own bins in order. If you've only added enough entries that the table has 32 slots, but you are using numbers up to 100 the items won't necessarily be in order.

Related

How to generate unique(!) arrays/lists/sequences of uniformly distributed random

Let‘s say I generate a pack, i.e., a one dimensional array of 10 random numbers with a random generator. Then I generate another array of 10 random numbers. I do this X times. How can I generate unique arrays, that even after a trillion generations, there is no array which is equal to another?
In one array, the elements can be duplicates. The array just has to differ from the other arrays with at least one different element from all its elements.
Is there any numpy method for this? Is there some special algorithm which works differently by exploring some space for the random generation? I don’t know.
One easy answer would be to write the arrays to a file and check if they were generated already, but the I/O operations on a subsequently bigger file needs way too much time.
This is a difficult request, since one of the properties of a RNG is that it should repeat sequences randomly.
You also have the problem of trying to record terabytes of prior results. Once thing you could try is to form a hash table (for search speed) of the existing arrays. Using this depends heavily on whether you have sufficient RAM to hold the entire list.
If not, you might consider disk-mapping a fast search structure of some sort. For instance, you could implement an on-disk binary tree of hash keys, re-balancing whenever you double the size of the tree (with insertions). This lets you keep the file open and find entries via seek, rather than needing to represent the full file in memory.
You could also maintain an in-memory index to the table, using that to drive your seek to the proper file section, then reading only a small subset of the file for the final search.
Does that help focus your implementation?
Assume that the 10 numbers in a pack are each in the range [0..max]. Each pack can then be considered as a 10 digit number in base max+1. Obviously, the size of max determines how many unique packs there are. For example, if max=9 there are 10,000,000,000 possible unique packs from [0000000000] to [9999999999].
The problem then comes down to generating unique numbers in the correct range.
Given your "trillions" then the best way to generate guaranteed unique numbers in the range is probably to use an encryption with the correct size output. Unless you want 64 bit (DES) or 128 bit (AES) output then you will need some sort of format preserving encryption to get output in the range you want.
For input, just encrypt the numbers 0, 1, 2, ... in turn. Encryption guarantees that, given the same key, the output is unique for each unique input. You just need to keep track of how far you have got with the input numbers. Given that, you can generate more unique packs as needed, within the limit imposed by max. After that point the output will start repeating.
Obviously as a final step you need to convert the encryption output to a 10 digit base max+1 number and put it into an array.
Important caveat:
This will not allow you to generate "arbitrarily" many unique packs. Please see limits as highlighted by #Prune.
Note that as the number of requested packs approaches the number of unique packs this takes longer and longer to find a pack. I also put in a safety so that after a certain number of tries it just gives up.
Feel free to adjust:
import random
## -----------------------
## Build a unique pack generator
## -----------------------
def build_pack_generator(pack_length, min_value, max_value, max_attempts):
existing_packs = set()
def _generator():
pack = tuple(random.randint(min_value, max_value) for _ in range(1, pack_length +1))
pack_hash = hash(pack)
attempts = 1
while pack_hash in existing_packs:
if attempts >= max_attempts:
raise KeyError("Unable to fine a valid pack")
pack = tuple(random.randint(min_value, max_value) for _ in range(1, pack_length +1))
pack_hash = hash(pack)
attempts += 1
existing_packs.add(pack_hash)
return list(pack)
return _generator
generate_unique_pack = build_pack_generator(2, 1, 9, 1000)
## -----------------------
for _ in range(50):
print(generate_unique_pack())
The Birthday problem suggests that at some point you don't need to bother checking for duplicates. For example, if each value in a 10 element "pack" can take on more than ~250 values then you only have a 50% chance of seeing a duplicate after generating 1e12 packs. The more distinct values each element can take on the lower this probability.
You've not specified what these random values are in this question (other than being uniformly distributed) but your linked question suggests they are Python floats. Hence each number has 2**53 distinct values it can take on, and the resulting probability of seeing a duplicate is practically zero.
There are a few ways of rearranging this calculation:
for a given amount of state and number of iterations what's the probability of seeing at least one collision
for a given amount of state how many iterations can you generate to stay below a given probability of seeing at least one collision
for a given number of iterations and probability of seeing a collision, what state size is required
The below Python code calculates option 3 as it seems closest to your question. The other options are available on the birthday attack page.
from math import log2, log1p
def birthday_state_size(size, p):
# -log1p(p) is a numerically stable version of log(1/(1+p))
return size**2 / (2*-log1p(-p))
log2(birthday_state_size(1e12, 1e-6)) # => ~100
So as long as you have more than 100 uniform bits of state in each pack everything should be fine. For example, two or more Python floats is OK (2 * 53), as is 10 integers with >= 1000 distinct values (10*log2(1000)).
You can of course reduce the probability down even further, but as noted in the Wikipedia article going below 1e-15 quickly approaches the reliability of a computer. This is why I say "practically zero" given the 530 bits of state provided by 10 uniformly distributed floats.

enumerate in dictionary loop take long time how to improv the speed

I am using python-3.x and I would like to speed my code where in every loop, I am creating new values and I checked if they exist or not in the dictionary by using the (check if) then I will keep the index where it is found if it exists in the dictionary. I am using the enumerate but it takes a long time and it very clear way. is there any way to speed my code by using another way or in my case the enumerate is the only way I need to work with? I am not sure in my case using numpy will be better.
Here is my code:
# import numpy
import numpy as np
# my first array
my_array_1 = np.random.choice ( np.linspace ( -1000 , 1000 , 2 ** 8 ) , size = ( 100 , 3 ) , replace = True )
my_array_1 = np.array(my_array_1)
# here I want to find the unique values from my_array_1
indx = np.unique(my_array_1, return_index=True, return_counts= True,axis=0)
#then saved the result to dictionary
dic_t= {"my_array_uniq":indx[0], # unique values in my_array_1
"counts":indx[2]} # how many times this unique element appear on my_array_1
# here I want to create random array 100 times
for i in range (100):
print (i)
# my 2nd array
my_array_2 = np.random.choice ( np.linspace ( -1000 , 1000 , 2 ** 8 ) , size = ( 100 , 3 ) , replace = True )
my_array_2 = np.array(my_array_2)
# I would like to check if the values in my_array_2 exists or not in the dictionary (my_array_uniq":indx[0])
# if it exists then I want to hold the index number of that value in the dictionary and
# add 1 to the dic_t["counts"], which mean this value appear agin and cunt how many.
# if not exists, then add this value to the dic (my_array_uniq":indx[0])
# also add 1 to the dic_t["counts"]
for i, a in enumerate(my_array_2):
ix = [k for k,j in enumerate(dic_t["my_array_uniq"]) if (a == j).all()]
if ix:
print (50*"*", i, "Yes", "at", ix[0])
dic_t["counts"][ix[0]] +=1
else:
# print (50*"*", i, "No")
dic_t["counts"] = np.hstack((dic_t["counts"],1))
dic_t["my_array_uniq"] = np.vstack((dic_t["my_array_uniq"], my_array_2[i]))
explanation:
1- I will create an initial array.
2- then I want to find the unique values, index and count from an initial array by using (np.unique).
3- saved the result to the dictionary (dic_t)
4- Then I want to start the loop by creating random values 100 times.
5- I would like to check if this random values in my_array_2 exist or not in the dictionary (my_array_uniq":indx[0])
6- if one of them exists then I want to hold the index number of that value in the dictionary.
7 - add 1 to the dic_t["counts"], which mean this value appears again and count how many.
8- if not exists, then add this value to the dic as new unique value (my_array_uniq":indx[0])
9 - also add 1 to the dic_t["counts"]
So from what I can see you are
Creating 256 random numbers from a linear distribution of numbers between -1000 and 1000
Generating 100 triplets from those (it could be fewer than 100 due to unique but with overwhelming probability it will be exactly 100)
Then doing pretty much the same thing 100 times and each time checking for each of the triplets in the new list whether they exist in the old list.
You're then trying to get a count of how often each element occurs.
I'm wondering why you're trying to do this, because it doesn't make much sense to me, but I'll give a few pointers:
There's no reason to make a dictionary dic_t if you're only going to hold to objects in it, just use two variables my_array_uniq and counts
You're dealing with triplets of floating point numbers. In the given range, that should give you about 10^48 different possible triplets (I may be wrong on the exact number but it's an absurdly large number either way). The way you're generating them does reduce the total phase-space a fair bit, but nowhere near enough. The probability of finding identical ones is very very low.
If you have a set of objects (in this case number triplets) and you want to determine whether you have seen a given one before, you want to use sets. Sets can only contain immutable objects, so you want to turn your triplets into tuples. Determining whether a given triplet is already contained in your set is then an O(1) operation.
For counting the number of occurences of sth, collections.Counter is the natural datastructure to use.

Avoid generating duplicate values from random

I want to generate random numbers and store them in a list as the following:
alist = [random.randint(0, 2 ** mypower - 1) for _ in range(total)]
My concern is the following: I want to generate total=40 million values in the range of (0, 2 ** mypower - 1). If mypower = 64, then alist will be of size ~20GB (40M*64*8) which is very large for my laptop memory. I have an idea to iteratively generate chunk of values, say 5 million at a time, and save them to a file so that I don't have to generate all 40M values at once. My concern is that if I do that in a loop, it is guaranteed that random.randint(0, 2 ** mypower - 1) will not generate values that were already generated from the previous iteration? Something like this:
for i in range(num_of_chunks):
alist = [random.randint(0, 2 ** mypower - 1) for _ in range(chunk)]
# save to file
Well, since efficiency/speed doesn't matter, I think this will work:
s = set()
while len(s) < total:
s.add(random.randint(0, 2 ** mypower - 1))
alist = list(s)
Since sets can only have unique elements in it, i think this will work well enough
To guarantee unique values you should avoid using random. Instead you should use an encryption. Because encryption is reversible, unique inputs guarantee unique outputs, given the same key. Encrypt the numbers 0, 1, 2, 3, ... and you will get guaranteed unique random-seeming outputs back providing you use a secure encryption. Good encryption is designed to give random-seeming output.
Keep track of the key (essential) and how far you have got. For your first batch encrypt integers 0..5,000,000. For the second batch encrypt 5,000,001..10,000,000 and so on.
You want 64 bit numbers, so use DES in ECB mode. DES is a 64-bit cipher, so the output from each encryption will be 64 bits. ECB mode does have a weakness, but that only applies with identical inputs. You are supplying unique inputs so the weakness is not relevant for your particular application.
If you need to regenerate the same numbers, just re-encrypt them with the same key. If you need a different set of random numbers (which will duplicate some from the first set) then use a different key. The guarantee of uniqueness only applies with a fixed key.
One way to generate random values that don't repeat is first to create a list of contiguous values
l = list(range(1000))
then shuffle it:
import random
random.shuffle(l)
You could do that several times, and save it in a file, but you'll have limited ranges since you'll never see the whole picture because of your limited memory (it's like trying to sort a big list without having the memory for it)
As someone noted, to get a wide span of random numbers, you'll need a lot of memory, so simple but not so efficient.
Another hack I just though of: do the same as above but generate a range using a step. Then in a second pass, add a random offset to the values. Even if the offset values repeat, it's guaranteed to never generate the same number twice:
import random
step = 10
l = list(range(0,1000-step,step))
random.shuffle(l)
newlist = [x+random.randrange(0,step) for x in l]
with the required max value and number of iterations that gives:
import random
number_of_iterations = 40*10**6
max_number = 2**64
step = max_number//number_of_iterations
l = list(range(0,max_number-step,step))
random.shuffle(l)
newlist = [x+random.randrange(0,step) for x in l]
print(len(newlist),len(set(newlist)))
runs in 1-2 minutes on my laptop, and gives 40000000 distinct values (evenly scattered across the range)
Usually random number generators are not really random at all. In fact, this is quite helpful in some situations. If you want the values to be unique after the second iteration, send it a different seed value.
random.seed()
The same seed will generate the same list, so if you want the next iteration to be the same, use the same seed. if you want it to be different, use a different seed.
It may need High CPU and Physical memory!
I suggest you to classify your data.
For Example you can Save:
All numbers starting with 10 and are 5 characters(Example:10365) to 10-5.txt
All numbers starting with 11 and are 6 characters(Example:114567) to 11-6.txt
Then for checking a new number:
For Example my number is 9256547
It starts with 92 and is 7 characters.
So I search 92-7.txt for this number and if it isn't duplicate,I will add it to 92-7.txt
Finally,You can join all files together.
Sorry If I have mistakes.My main language isn't English.

Best way to hash ordered permutation of [1,9]

I'm trying to implement a method to keep the visited states of the 8 puzzle from generating again.
My initial approach was to save each visited pattern in a list and do a linear check each time the algorithm wants to generate a child.
Now I want to do this in O(1) time through list access. Each pattern in 8 puzzle is an ordered permutation of numbers between 1 to 9 (9 being the blank block), for example 125346987 is:
1 2 5
3 4 6
_ 8 7
The number of all of the possible permutation of this kind is around 363,000 (9!). what is the best way to hash these numbers to indexes of a list of that size?
You can map a permutation of N items to its index in the list of all permutations of N items (ordered lexicographically).
Here's some code that does this, and a demonstration that it produces indexes 0 to 23 once each for all permutations of a 4-letter sequence.
import itertools
def fact(n):
r = 1
for i in xrange(n):
r *= i + 1
return r
def I(perm):
if len(perm) == 1:
return 0
return sum(p < perm[0] for p in perm) * fact(len(perm) - 1) + I(perm[1:])
for p in itertools.permutations('abcd'):
print p, I(p)
The best way to understand the code is to prove its correctness. For an array of length n, there's (n-1)! permutations with the smallest element of the array appearing first, (n-1)! permutations with the second smallest element appearing first, and so on.
So, to find the index of a given permutation, see count how many items are smaller than the first thing in the permutation and multiply that by (n-1)!. Then recursively add the index of the remainder of the permutation, considered as a permutation of (n-1) elements. The base case is when you have a permutation of length 1. Obviously there's only one such permutation, so its index is 0.
A worked example: [1324].
[1324]: 1 appears first, and that's the smallest element in the array, so that gives 0 * (3!)
Removing 1 gives us [324]. The first element is 3. There's one element that's smaller, so that gives us 1 * (2!).
Removing 3 gives us [24]. The first element is 2. That's the smallest element remaining, so that gives us 0 * (1!).
Removing 2 gives us [4]. There's only one element, so we use the base case and get 0.
Adding up, we get 0*3! + 1*2! + 0*1! + 0 = 1*2! = 2. So [1324] is at index 2 in the sorted list of 4 permutations. That's correct, because at index 0 is [1234], index 1 is [1243], and the lexicographically next permutation is our [1324].
I believe you're asking for a function to map permutations to array indices. This dictionary maps all permutations of numbers 1-9 to values from 0 to 9!-1.
import itertools
index = itertools.count(0)
permutations = itertools.permutations(range(1, 10))
hashes = {h:next(index) for h in permutations}
For example, hashes[(1,2,5,3,4,6,9,8,7)] gives a value of 1445.
If you need them in strings instead of tuples, use:
permutations = [''.join(x) for x in itertools.permutations('123456789')]
or as integers:
permutations = [int(''.join(x)) for x in itertools.permutations('123456789')]
It looks like you are only interested in whether or not you have already visited the permutation.
You should use a set. It grants the O(1) look-up you are interested in.
A space as well lookup efficient structure for this problem is a trie type structure, as it will use common space for lexicographical matches in any
permutation.
i.e. the space used for "123" in 1234, and in 1235 will be the same.
Lets assume 0 as replacement for '_' in your example for simplicity.
Storing
Your trie will be a tree of booleans, the root node will be an empty node, and then each node will contain 9 children with a boolean flag set to false, the 9 children specify digits 0 to 8 and _ .
You can create the trie on the go, as you encounter a permutation, and store the encountered digits as boolean in the trie by setting the bool as true.
Lookup
The trie is traversed from root to children based on digits of the permutation, and if the nodes have been marked as true, that means the permutation has occured before. The complexity of lookup is just 9 node hops.
Here is how the trie would look for a 4 digit example :
Python trie
This trie can be easily stored in a list of booleans, say myList.
Where myList[0] is the root, as explained in the concept here :
https://webdocs.cs.ualberta.ca/~holte/T26/tree-as-array.html
The final trie in a list would be around 9+9^2+9^3....9^8 bits i.e. less than 10 MB for all lookups.
Use
I've developed a heuristic function for this specific case. It is not a perfect hashing, as the mapping is not between [0,9!-1] but between [1,767359], but it is O(1).
Let's assume we already have a file / reserved memory / whatever with 767359 bits set to 0 (e.g., mem = [False] * 767359). Let a 8puzzle pattern be mapped to a python string (e.g., '125346987'). Then, the hash function is determined by:
def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
k = opts.index(int(c))
opts.pop(k)
data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output
I.e., in order to check if a 8puzzle pattern = 125346987 has already been used, we need to:
pattern = '125346987'
pos = getPosition(pattern)
used = mem[pos-1] #mem starts in 0, getPosition in 1.
With a perfect hashing we would have needed 9! bits to store the booleans. In this case we need 2x more (767359/9! = 2.11), but recall that it is not even 1Mb (barely 100KB).
Note that the function is easily invertible.
Check
I could prove you mathematically why this works and why there won't be any collision, but since this is a programming forum let's just run it for every possible permutation and check that all the hash values (positions) are indeed different:
def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
k = opts.index(int(c))
opts.pop(k)
data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output
#CHECKING PURPOSES
def addperm(x,l):
return [ l[0:i] + [x] + l[i:] for i in range(len(l)+1) ]
def perm(l):
if len(l) == 0:
return [[]]
return [x for y in perm(l[1:]) for x in addperm(l[0],y) ]
#We generate all the permutations
all_perms = perm([ i for i in range(1,10)])
print "Number of all possible perms.: "+str(len(all_perms)) #indeed 9! = 362880
#We execute our hash function over all the perms and store the output.
all_positions = [];
for permutation in all_perms:
perm_string = ''.join(map(str,permutation))
all_positions.append(getPosition(perm_string))
#We wan't to check if there has been any collision, i.e., if there
#is one position that is repeated at least twice.
print "Number of different hashes: "+str(len(set(all_positions)))
#also 9!, so the hash works properly.
How does it work?
The idea behind this has to do with a tree: at the beginning it has 9 branches going to 9 nodes, each corresponding to a digit. From each of these nodes we have 8 branches going to 8 nodes, each corresponding to a digit except its parent, then 7, and so on.
We first store the first digit of our input string in a separate variable and pop it out from our 'node' list, because we have already taken the branch corresponding to the first digit.
Then we have 8 branches, we choose the one corresponding with our second digit. Note that, since there are 8 branches, we need 3 bits to store the index of our chosen branch and the maximum value it can take is 111 for the 8th branch (we map branch 1-8 to binary 000-111). Once we have chosen and store the branch index, we pop that value out, so that the next node list doesn't include again this digit.
We proceed in the same way for branches 7, 6 and 5. Note that when we have 7 branches we still need 3 bits, though the maximum value will be 110. When we have 5 branches, the index will be at most binary 100.
Then we get to 4 branches and we notice that this can be stored just with 2 bits, same for 3 branches. For 2 branches we will just need 1bit, and for the last branch we don't need any bit: there will be just one branch pointing to the last digit, which will be the remaining from our 1-9 original list.
So, what we have so far: the first digit stored in a separated variable and a list of 7 indexes representing branches. The first 4 indexes can be represented with 3bits, the following 2 indexes can be represented with 2bits and the last index with 1bit.
The idea is to concatenate all this indexes in their bit form to create a larger number. Since we have 17bits, this number will be at most 2^17=131072. Now we just add the first digit we had stored to the end of that number (at most this digit will be 9) and we have that the biggest number we can create is 1310729.
But we can do better: recall that when we had 5 branches we needed 3 bits, though the maximum value was binary 100. What if we arrange our bits so that those with more 0s come first? If so, in the worst case scenario our final bit number will be the concatenation of:
100 10 101 110 111 11 1
Which in decimal is 76735. Then we proceed as before (adding the 9 at the end) and we get that our biggest possible generated number is 767359, which is the ammount of bits we need and corresponds to input string 987654321, while the lowest possible number is 1 which corresponds to input string 123456789.
Just to finish: one might wonder why have we stored the first digit in a separate variable and added it at the end. The reason is that if we had kept it then the number of branches at the beginning would have been 9, so for storing the first index (1-9) we would have needed 4 bits (0000 to 1000). which would have make our mapping much less efficient, as in that case the biggest possible number (and therefore the amount of memory needed) would have been
1000 100 10 101 110 111 11 1
which is 1125311 in decimal (1.13Mb vs 768Kb). It is quite interesting to see that the ratio 1.13M/0.768K = 1.47 has something to do with the ratio of the four bits compared to just adding a decimal value (2^4/10 = 1.6) which makes a lot of sense (the difference is due to the fact that with the first approach we are not fully using the 4 bits).
First. There is nothing faster than a list of booleans. There's a total of 9! == 362880 possible permutations for your task, which is a reasonably small amount of data to store in memory:
visited_states = [False] * math.factorial(9)
Alternatively, you can use array of bytes which is slightly slower (not by much though) and has a much lower memory footprint (by a power of magnitude at least). However any memory savings from using an array will probably be of little value considering the next step.
Second. You need to convert your specific permutation to it's index. There are algorithms which do this, one of the best StackOverflow questions on this topic is probably this one:
Finding the index of a given permutation
You have fixed permutation size n == 9, so whatever complexity an algorithm has, it will be equivalent to O(1) in your situation.
However to produce even faster results, you can pre-populate a mapping dictionary which will give you an O(1) lookup:
all_permutations = map(lambda p: ''.join(p), itertools.permutations('123456789'))
permutation_index = dict((perm, index) for index, perm in enumerate(all_permutations))
This dictionary will consume about 50 Mb of memory, which is... not that much actually. Especially since you only need to create it once.
After all this is done, checking your specific combination is done with:
visited = visited_states[permutation_index['168249357']]
Marking it to visited is done in the same manner:
visited_states[permutation_index['168249357']] = True
Note that using any of permutation index algorithms will be much slower than mapping dictionary. Most of those algorithms are of O(n2) complexity and in your case it results 81 times worse performance even discounting the extra python code itself. So unless you have heavy memory constraints, using mapping dictionary is probably the best solution speed-wise.
Addendum. As has been pointed out by Palec, visited_states list is actually not needed at all - it's perfectly possible to store True/False values directly in the permutation_index dictionary, which saves some memory and an extra list lookup.
Notice if you type hash(125346987) it returns 125346987. That is for a reason, because there is no point in hashing an integer to anything other than an integer.
What you should do, is when you find a pattern add it to a dictionary rather than a list. This will provide the fast lookup you need rather than traversing the list like you are doing now.
So say you find the pattern 125346987 you can do:
foundPatterns = {}
#some code to find the pattern
foundPatterns[1] = 125346987
#more code
#test if there?
125346987 in foundPatterns.values()
True
If you must always have O(1), then seems like a bit array would do the job. You'd only need to store 363,000 elements, which seems doable. Though note that in practice it's not always faster. Simplest implementation looks like:
Create data structure
visited_bitset = [False for _ in xrange(373000)]
Test current state and add if not visited yet
if !visited[current_state]:
visited_bitset[current_state] = True
Paul's answer might work.
Elisha's answer is perfectly valid hash function that would guarantee that no collision happen in the hash function. The 9! would be a pure minimum for a guaranteed no collision hash function, but (unless someone corrects me, Paul probably has) I don't believe there exists a function to map each board to a value in the domain [0, 9!], let alone a hash function that is nothing more that O(1).
If you have a 1GB of memory to support a Boolean array of 864197532 (aka 987654321-12346789) indices. You guarantee (computationally) the O(1) requirement.
Practically (meaning when you run in a real system) speaking this isn't going to be cache friendly but on paper this solution will definitely work. Even if an perfect function did exist, doubt it too would be cache friendly either.
Using prebuilts like set or hashmap (sorry I haven't programmed Python in a while, so don't remember the datatype) must have an amortized 0(1). But using one of these with a suboptimal hash function like n % RANDOM_PRIME_NUM_GREATER_THAN_100000 might give the best solution.

List's and while loops - Python

I am fairly new to Python and I am stuck on a particular question and I thought i'd ask you guys.
The following contains my code so far, aswell as the questions that lie therein:
list=[100,20,30,40 etc...]
Just a list with different numeric values representing an objects weight in grams.
object=0
while len(list)>0:
list_caluclation=list.pop(0)
print(object number:",(object),"evaluates to")
What i want to do next is evaluate the items in the list. So that if we go with index[0], we have a list value of 100. THen i want to separate this into smaller pieces like, for a 100 gram object, one would split it into five 20 gram units. If the value being split up was 35, then it would be one 20 gram unit, on 10 gram unit and one 5 gram unit.
The five units i want to split into are: 20, 10, 5, 1 and 0.5.
If anyone has a quick tip regarding my issue, it would be much appreciated.
Regards
You should think about solving this for a single number first. So what you essentially want to do is split up a number into a partition of known components. This is also known as the Change-making problem. You can choose a greedy algorithm for this that always takes the largest component size as long as it’s still possible:
units = [20, 10, 5, 1, 0.5]
def change (number):
counts = {}
for unit in units:
count, number = divmod(number, unit)
counts[unit] = count
return counts
So this will return a dictionary that maps from each unit to the count of that unit required to get to the target number.
You just need to call that function for each item in your original list.
One way you could do it with a double for loop. The outer loop would be the numbers you input and the inner loop would be the values you want to evaluate (ie [20,10,5,1,0.5]). For each iteration of the inner loop, find how many times the value goes into the number (using the floor method), and then use the modulo operator to reassign the number to be the remainder. On each loop you can have it print out the info that you want :) Im not sure exactly what kind of output you're looking for, but I hope this helps!
Ex:
import math
myList=[100,20,30,40,35]
values=[20,10,5,1,0.5]
for i in myList:
print(str(i)+" evaluates to: ")
for num in values:
evaluation=math.floor(i/num)
print("\t"+str(num)+"'s: "+str(evaluation))
i%=num

Categories

Resources