Find the element in virtually infinite list

Find the element in virtually infinite list - python

I'm trying to solve this problem:
A list is initialized to ["Sheldon", "Leonard", "Penny", "Rajesh", "Howard"], and then undergoes a series of operations. In each operation, the first element of the list is moved to the end of the list and duplicated. For example, in the first operation, the list becomes ["Leonard", "Penny", "Rajesh", "Howard", "Sheldon", "Sheldon"] (with "Sheldon" being moved and duplicated); in the second operation, it becomes ["Penny", "Rajesh", "Howard", "Sheldon", "Sheldon", "Leonard", "Leonard"] (with "Leonard" being moved and duplicated); etc. Given a positive integer n, find the string that is moved and duplicated in the nth operation. [paraphrased from https://codeforces.com/problemset/problem/82/A]
I've written a working solution, but it's too slow when n is huge:
l = ['Sheldon','Leonard','Penny','Rajesh','Howard']
n = int(input()) # taking input from user to print the name of the person
# standing at that position
for i in range(n):
t = l.pop(0)
l.append(t)
l.append(t)
#debug
# print(l)
print(t)
How can I do this faster?

Here's a solution that runs in O(log(input/len(l))) without doing any actual computation (no list operations):
l = ['Sheldon','Leonard','Penny','Rajesh','Howard']
n = int(input()) # taking input from user to print the name of the person
# standing at that position
i = 0
while n>(len(l)*2**i):
n = n - len(l)* (2**i)
i = i + 1
index = int((n-1)/(2**i ))
print(l[index])
Explanation: every time you push back the entire list, the list length will grow by exactly len(l) x 2^i. But you have to first find out how many times this happens. This is what the while is doing (that's what n = n - len(l)* (2**i) is doing). The while stops when it realized that i times of appending the double list will happen. Finally, after you have figured i out, you have to compute the index. But in the i-th appeneded list, every element is copied 2^i times, so you have to devide the number by 2**i. One minor detail is that for the index you have to subtract by 1 because lists in Python are 0-indexed while your input is 1-indexed.

As #khelwood said, you can deduce how many times you have to double the list.
To understand this, note that if you start with a list of 5 people and do 5 steps of your iteration, you will the same order as before just with everyone twice in it.
I am not 100% sure what you mean with the nth position as it shifts all the time, but if you mean the person in front after n iterations, solve for the largest integer i that fulfills
5*2^i<n
to get the number of times your list doubled. Then just look at the remaining list (each name is mentioned i times) to get the name at position n-5*2^i.

You are not going to be able to avoid calculating the list, but maybe you can make it a bit easier:
Every cycle (When sheldon is first again) the length of the list has doubled, so it looks like this:
After 1 cycle: SSLLPPRRHH
After 2 cycles: SSSSLLLLPPPPRRRRHHHH
...
while the number of cola's they drunk is 5*((2**n)-1) where the n is the number of cycles.
So you can calculate the state of the list at the closest ended cycle.
E.g.
Cola number 50:
5*((2**3)) = 40 means that after 40 cokes sheldon is next in line.
Then you can use the algorithm described in the task and get the last one in the line.
Hope this helps.

Related

What's The Big(O) Complexity of this loop?

def cups (a,b):
i=0
j=0
done = False
while not done:
if a[i]==b[j] :
print("A[" + str(i) + "] with B[" + str(j) + "]")
i += 1
j = 0
if i == len(a):
i=0
done = True
if a[i] != b[j] :
j += 1
I'm trying to compare two lists and print the indices of two values that are the same in the two lists
I'm curious whether the complexity is O(1) or O(n)?

I suspect that your code might not be working the way you want it to, but this question seems to be more about algorithmic complexity and less about your particular implementation, so I'll focus on that.
In general, two sorted lists can be compared in the way you describe in linear time by advancing pointers. We can use two streets of numbered houses as a concrete example: if you and I want to find out what house numbers are duplicated on Main Street and Elm street, we can do the following:
you start at the bottom of Main Street, I start at the bottom of Elm Street.
we each report the number that we see
if the numbers match, we record that number
if not, one of us is seeing a number that is lower than the other. That one walks up the street until they find a number which is equal to or greater than the last one reported by the other and repeat from step 3
In this case, neither of us ever goes back to the bottom of the street.
However, if the lists are not sorted, then we have to use a different approach. Assuming that Main Street and Elm street are numbered in random order, we have to do the following:
I start at the bottom of Elm Street.
I report the number that I see.
you start at the bottom of Main street and walk up until you find a house that has that number, or until you reach the end of Main. If you find a match, we record it.
I advance to the next house and we repeat from step 3.
This is an O(n**2) algorithm, as you can see (you walk up Main St. once for each house on Elm, if there are n houses on each street then we're looking at n * n operations)
This is the state you're in - the problem you're stating cannot be solved in O(n)
However, I will point out that sorting a list is an O(n log n) operation, which would allow you to reduce the problem to the linear case, for a final complexity of O(n log n)

Putting aside any bugs you have in the code - and concentrating on the runtime complexity question:
O(1) means the code is running at a constant time regardless of the size of the input, and clearly here it is not the case since different sizes of a and b will result in different numbers of loop iterations.
In your case your while loop can iterate at most len(a) * len(b) times (roughly)
So you can say that the code has a run time complexity of the order of O(n), where n=len(a)*len(b)

You're scanning all elements of at least one list, so it cannot be constant
The best case, since you're setting j=0, is when all elements of a equal the first element of b, therefore making all elements of a the same, and this is linear time, but big-O does not measure best case
In the worst case, you're scanning all elements of both lists. For each non-equal element, you're scanning all of b until matched; and when you find a match, you're resetting j back to the start of b, so really it would be O(N*M), and for equal length lists, this is O(N^2)
Note: the more generic algorithm for this is this
for a_elem, i in enumerate(a):
for b_elem, j in enumerate(b, start=i)
if a_elem == b_elem:
print(f"a[{i}] with b[{j}]")

Best way to hash ordered permutation of [1,9]

I'm trying to implement a method to keep the visited states of the 8 puzzle from generating again.
My initial approach was to save each visited pattern in a list and do a linear check each time the algorithm wants to generate a child.
Now I want to do this in O(1) time through list access. Each pattern in 8 puzzle is an ordered permutation of numbers between 1 to 9 (9 being the blank block), for example 125346987 is:
1 2 5
3 4 6
_ 8 7
The number of all of the possible permutation of this kind is around 363,000 (9!). what is the best way to hash these numbers to indexes of a list of that size?

You can map a permutation of N items to its index in the list of all permutations of N items (ordered lexicographically).
Here's some code that does this, and a demonstration that it produces indexes 0 to 23 once each for all permutations of a 4-letter sequence.
import itertools
def fact(n):
r = 1
for i in xrange(n):
r *= i + 1
return r
def I(perm):
if len(perm) == 1:
return 0
return sum(p < perm[0] for p in perm) * fact(len(perm) - 1) + I(perm[1:])
for p in itertools.permutations('abcd'):
print p, I(p)
The best way to understand the code is to prove its correctness. For an array of length n, there's (n-1)! permutations with the smallest element of the array appearing first, (n-1)! permutations with the second smallest element appearing first, and so on.
So, to find the index of a given permutation, see count how many items are smaller than the first thing in the permutation and multiply that by (n-1)!. Then recursively add the index of the remainder of the permutation, considered as a permutation of (n-1) elements. The base case is when you have a permutation of length 1. Obviously there's only one such permutation, so its index is 0.
A worked example: [1324].
[1324]: 1 appears first, and that's the smallest element in the array, so that gives 0 * (3!)
Removing 1 gives us [324]. The first element is 3. There's one element that's smaller, so that gives us 1 * (2!).
Removing 3 gives us [24]. The first element is 2. That's the smallest element remaining, so that gives us 0 * (1!).
Removing 2 gives us [4]. There's only one element, so we use the base case and get 0.
Adding up, we get 0*3! + 1*2! + 0*1! + 0 = 1*2! = 2. So [1324] is at index 2 in the sorted list of 4 permutations. That's correct, because at index 0 is [1234], index 1 is [1243], and the lexicographically next permutation is our [1324].

I believe you're asking for a function to map permutations to array indices. This dictionary maps all permutations of numbers 1-9 to values from 0 to 9!-1.
import itertools
index = itertools.count(0)
permutations = itertools.permutations(range(1, 10))
hashes = {h:next(index) for h in permutations}
For example, hashes[(1,2,5,3,4,6,9,8,7)] gives a value of 1445.
If you need them in strings instead of tuples, use:
permutations = [''.join(x) for x in itertools.permutations('123456789')]
or as integers:
permutations = [int(''.join(x)) for x in itertools.permutations('123456789')]

It looks like you are only interested in whether or not you have already visited the permutation.
You should use a set. It grants the O(1) look-up you are interested in.

A space as well lookup efficient structure for this problem is a trie type structure, as it will use common space for lexicographical matches in any
permutation.
i.e. the space used for "123" in 1234, and in 1235 will be the same.
Lets assume 0 as replacement for '_' in your example for simplicity.
Storing
Your trie will be a tree of booleans, the root node will be an empty node, and then each node will contain 9 children with a boolean flag set to false, the 9 children specify digits 0 to 8 and _ .
You can create the trie on the go, as you encounter a permutation, and store the encountered digits as boolean in the trie by setting the bool as true.
Lookup
The trie is traversed from root to children based on digits of the permutation, and if the nodes have been marked as true, that means the permutation has occured before. The complexity of lookup is just 9 node hops.
Here is how the trie would look for a 4 digit example :
Python trie
This trie can be easily stored in a list of booleans, say myList.
Where myList[0] is the root, as explained in the concept here :
https://webdocs.cs.ualberta.ca/~holte/T26/tree-as-array.html
The final trie in a list would be around 9+9^2+9^3....9^8 bits i.e. less than 10 MB for all lookups.

Use
I've developed a heuristic function for this specific case. It is not a perfect hashing, as the mapping is not between [0,9!-1] but between [1,767359], but it is O(1).
Let's assume we already have a file / reserved memory / whatever with 767359 bits set to 0 (e.g., mem = [False] * 767359). Let a 8puzzle pattern be mapped to a python string (e.g., '125346987'). Then, the hash function is determined by:
def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
k = opts.index(int(c))
opts.pop(k)
data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output
I.e., in order to check if a 8puzzle pattern = 125346987 has already been used, we need to:
pattern = '125346987'
pos = getPosition(pattern)
used = mem[pos-1] #mem starts in 0, getPosition in 1.
With a perfect hashing we would have needed 9! bits to store the booleans. In this case we need 2x more (767359/9! = 2.11), but recall that it is not even 1Mb (barely 100KB).
Note that the function is easily invertible.
Check
I could prove you mathematically why this works and why there won't be any collision, but since this is a programming forum let's just run it for every possible permutation and check that all the hash values (positions) are indeed different:
def getPosition( input_str ):
data = []
opts = range(1,10)
n = int(input_str[0])
opts.pop(opts.index(n))
for c in input_str[1:len(input_str)-1]:
k = opts.index(int(c))
opts.pop(k)
data.append(k)
ind = data[3]<<14 | data[5]<<12 | data[2]<<9 | data[1]<<6 | data[0]<<3 | data[4]<<1 | data[6]<<0
output_str = str(ind)+str(n)
output = int(output_str)
return output
#CHECKING PURPOSES
def addperm(x,l):
return [ l[0:i] + [x] + l[i:] for i in range(len(l)+1) ]
def perm(l):
if len(l) == 0:
return [[]]
return [x for y in perm(l[1:]) for x in addperm(l[0],y) ]
#We generate all the permutations
all_perms = perm([ i for i in range(1,10)])
print "Number of all possible perms.: "+str(len(all_perms)) #indeed 9! = 362880
#We execute our hash function over all the perms and store the output.
all_positions = [];
for permutation in all_perms:
perm_string = ''.join(map(str,permutation))
all_positions.append(getPosition(perm_string))
#We wan't to check if there has been any collision, i.e., if there
#is one position that is repeated at least twice.
print "Number of different hashes: "+str(len(set(all_positions)))
#also 9!, so the hash works properly.
How does it work?
The idea behind this has to do with a tree: at the beginning it has 9 branches going to 9 nodes, each corresponding to a digit. From each of these nodes we have 8 branches going to 8 nodes, each corresponding to a digit except its parent, then 7, and so on.
We first store the first digit of our input string in a separate variable and pop it out from our 'node' list, because we have already taken the branch corresponding to the first digit.
Then we have 8 branches, we choose the one corresponding with our second digit. Note that, since there are 8 branches, we need 3 bits to store the index of our chosen branch and the maximum value it can take is 111 for the 8th branch (we map branch 1-8 to binary 000-111). Once we have chosen and store the branch index, we pop that value out, so that the next node list doesn't include again this digit.
We proceed in the same way for branches 7, 6 and 5. Note that when we have 7 branches we still need 3 bits, though the maximum value will be 110. When we have 5 branches, the index will be at most binary 100.
Then we get to 4 branches and we notice that this can be stored just with 2 bits, same for 3 branches. For 2 branches we will just need 1bit, and for the last branch we don't need any bit: there will be just one branch pointing to the last digit, which will be the remaining from our 1-9 original list.
So, what we have so far: the first digit stored in a separated variable and a list of 7 indexes representing branches. The first 4 indexes can be represented with 3bits, the following 2 indexes can be represented with 2bits and the last index with 1bit.
The idea is to concatenate all this indexes in their bit form to create a larger number. Since we have 17bits, this number will be at most 2^17=131072. Now we just add the first digit we had stored to the end of that number (at most this digit will be 9) and we have that the biggest number we can create is 1310729.
But we can do better: recall that when we had 5 branches we needed 3 bits, though the maximum value was binary 100. What if we arrange our bits so that those with more 0s come first? If so, in the worst case scenario our final bit number will be the concatenation of:
100 10 101 110 111 11 1
Which in decimal is 76735. Then we proceed as before (adding the 9 at the end) and we get that our biggest possible generated number is 767359, which is the ammount of bits we need and corresponds to input string 987654321, while the lowest possible number is 1 which corresponds to input string 123456789.
Just to finish: one might wonder why have we stored the first digit in a separate variable and added it at the end. The reason is that if we had kept it then the number of branches at the beginning would have been 9, so for storing the first index (1-9) we would have needed 4 bits (0000 to 1000). which would have make our mapping much less efficient, as in that case the biggest possible number (and therefore the amount of memory needed) would have been
1000 100 10 101 110 111 11 1
which is 1125311 in decimal (1.13Mb vs 768Kb). It is quite interesting to see that the ratio 1.13M/0.768K = 1.47 has something to do with the ratio of the four bits compared to just adding a decimal value (2^4/10 = 1.6) which makes a lot of sense (the difference is due to the fact that with the first approach we are not fully using the 4 bits).

First. There is nothing faster than a list of booleans. There's a total of 9! == 362880 possible permutations for your task, which is a reasonably small amount of data to store in memory:
visited_states = [False] * math.factorial(9)
Alternatively, you can use array of bytes which is slightly slower (not by much though) and has a much lower memory footprint (by a power of magnitude at least). However any memory savings from using an array will probably be of little value considering the next step.
Second. You need to convert your specific permutation to it's index. There are algorithms which do this, one of the best StackOverflow questions on this topic is probably this one:
Finding the index of a given permutation
You have fixed permutation size n == 9, so whatever complexity an algorithm has, it will be equivalent to O(1) in your situation.
However to produce even faster results, you can pre-populate a mapping dictionary which will give you an O(1) lookup:
all_permutations = map(lambda p: ''.join(p), itertools.permutations('123456789'))
permutation_index = dict((perm, index) for index, perm in enumerate(all_permutations))
This dictionary will consume about 50 Mb of memory, which is... not that much actually. Especially since you only need to create it once.
After all this is done, checking your specific combination is done with:
visited = visited_states[permutation_index['168249357']]
Marking it to visited is done in the same manner:
visited_states[permutation_index['168249357']] = True
Note that using any of permutation index algorithms will be much slower than mapping dictionary. Most of those algorithms are of O(n2) complexity and in your case it results 81 times worse performance even discounting the extra python code itself. So unless you have heavy memory constraints, using mapping dictionary is probably the best solution speed-wise.
Addendum. As has been pointed out by Palec, visited_states list is actually not needed at all - it's perfectly possible to store True/False values directly in the permutation_index dictionary, which saves some memory and an extra list lookup.

Notice if you type hash(125346987) it returns 125346987. That is for a reason, because there is no point in hashing an integer to anything other than an integer.
What you should do, is when you find a pattern add it to a dictionary rather than a list. This will provide the fast lookup you need rather than traversing the list like you are doing now.
So say you find the pattern 125346987 you can do:
foundPatterns = {}
#some code to find the pattern
foundPatterns[1] = 125346987
#more code
#test if there?
125346987 in foundPatterns.values()
True

If you must always have O(1), then seems like a bit array would do the job. You'd only need to store 363,000 elements, which seems doable. Though note that in practice it's not always faster. Simplest implementation looks like:
Create data structure
visited_bitset = [False for _ in xrange(373000)]
Test current state and add if not visited yet
if !visited[current_state]:
visited_bitset[current_state] = True

Paul's answer might work.
Elisha's answer is perfectly valid hash function that would guarantee that no collision happen in the hash function. The 9! would be a pure minimum for a guaranteed no collision hash function, but (unless someone corrects me, Paul probably has) I don't believe there exists a function to map each board to a value in the domain [0, 9!], let alone a hash function that is nothing more that O(1).
If you have a 1GB of memory to support a Boolean array of 864197532 (aka 987654321-12346789) indices. You guarantee (computationally) the O(1) requirement.
Practically (meaning when you run in a real system) speaking this isn't going to be cache friendly but on paper this solution will definitely work. Even if an perfect function did exist, doubt it too would be cache friendly either.
Using prebuilts like set or hashmap (sorry I haven't programmed Python in a while, so don't remember the datatype) must have an amortized 0(1). But using one of these with a suboptimal hash function like n % RANDOM_PRIME_NUM_GREATER_THAN_100000 might give the best solution.

Algos - Delete Extremes From A List of Integers in Python?

I want to eliminate extremes from a list of integers in Python. I'd say that my problem is one of design. Here's what I cooked up so far:
listToTest = [120,130,140,160,200]
def function(l):
length = len(l)
for x in xrange(0,length - 1):
if l[x] < (l[x+1] - l[x]) * 4:
l.remove(l[x+1])
return l
print function(listToTest)
So the output of this should be: 120,130,140,160 without 200, since that's way too far ahead from the others.
And this works, given 200 is the last one or there's only one extreme. Though, it gets problematic with a list like this:
listToTest = [120,200,130,140,160,200]
Or
listToTest = [120,130,140,160,200,140,130,120,200]
So, the output for the last list should be: 120,130,140,160,140,130,120. 200 should be gone, since it's a lot bigger than the "usual", which revolved around ~130-140.
To illustrate it, here's an image:
Obviously, my method doesn't work. Some thoughts:
- I need to somehow do a comparison between x and x+1, see if the next two pairs have a bigger difference than the last pair, then if it does, the pair that has a bigger difference should have one element eliminated (the biggest one), then, recursively do this again. I think I should also have an "acceptable difference", so it knows when the difference is acceptable and not break the recursivity so I end up with only 2 values.
I tried writting it, but no luck so far.

You can use statistics here, eliminating values that fall beyond n standard deviations from the mean:
import numpy as np
test = [120,130,140,160,200,140,130,120,200]
n = 1
output = [x for x in test if abs(x - np.mean(test)) < np.std(test) * n]
# output is [120, 130, 140, 160, 140, 130, 120]

Your problem statement is not clear. If you simply want to remove the max and min then that is a simple
O(N) with 2 extra memory- which is O(1)
operation. This is achieved by retaining the current min/max value and comparing it to each entry in the list in turn.
If you want the min/max K items it is still
O(N + KlogK) with O(k) extra memory
operation. This is achieved by two priorityqueue's of size K: one for the mins, one for the max's.
Or did you intend a different output/outcome from your algorithm?
UPDATE the OP has updated the question: it appears they want a moving (/windowed) average and to delete outliers.
The following is an online algorithm -i.e. it can handle streaming data http://en.wikipedia.org/wiki/Online_algorithm
We can retain a moving average: let's say you keep K entries for the average.
Then create a linked list of size K and a pointer to the head and tail. Now: handling items within the first K entries needs to be thought out separately. After the first K retained items the algo can proceed as follows:
check the next item in the input list against the running k-average. If the value exceeds the acceptable ratio threshold then put its list index into a separate "deletion queue" list. Otherwise: update the running windowed sum as follows:
(a) remove the head entry from the linked list and subtract its value from the running sum
(b) add the latest list entry as the tail of the linked list and add its value to the running sum
(c) recalculate the running average as the running sum /K
Now: how to handle the first K entries? - i.e. before we have a properly initialized running sum?
You will need to make some hard-coded decisions here. A possibility:
run through all first K+2D (D << K) entries.
Keep d max/min values
Remove the d (<< K) max/min values from that list

List's and while loops - Python

I am fairly new to Python and I am stuck on a particular question and I thought i'd ask you guys.
The following contains my code so far, aswell as the questions that lie therein:
list=[100,20,30,40 etc...]
Just a list with different numeric values representing an objects weight in grams.
object=0
while len(list)>0:
list_caluclation=list.pop(0)
print(object number:",(object),"evaluates to")
What i want to do next is evaluate the items in the list. So that if we go with index[0], we have a list value of 100. THen i want to separate this into smaller pieces like, for a 100 gram object, one would split it into five 20 gram units. If the value being split up was 35, then it would be one 20 gram unit, on 10 gram unit and one 5 gram unit.
The five units i want to split into are: 20, 10, 5, 1 and 0.5.
If anyone has a quick tip regarding my issue, it would be much appreciated.
Regards

You should think about solving this for a single number first. So what you essentially want to do is split up a number into a partition of known components. This is also known as the Change-making problem. You can choose a greedy algorithm for this that always takes the largest component size as long as it’s still possible:
units = [20, 10, 5, 1, 0.5]
def change (number):
counts = {}
for unit in units:
count, number = divmod(number, unit)
counts[unit] = count
return counts
So this will return a dictionary that maps from each unit to the count of that unit required to get to the target number.
You just need to call that function for each item in your original list.

One way you could do it with a double for loop. The outer loop would be the numbers you input and the inner loop would be the values you want to evaluate (ie [20,10,5,1,0.5]). For each iteration of the inner loop, find how many times the value goes into the number (using the floor method), and then use the modulo operator to reassign the number to be the remainder. On each loop you can have it print out the info that you want :) Im not sure exactly what kind of output you're looking for, but I hope this helps!
Ex:
import math
myList=[100,20,30,40,35]
values=[20,10,5,1,0.5]
for i in myList:
print(str(i)+" evaluates to: ")
for num in values:
evaluation=math.floor(i/num)
print("\t"+str(num)+"'s: "+str(evaluation))
i%=num

deconstructing word solution

I have a word problem I am trying to solve but am getting stuck on a key part.
Initialize n to be 100. Initialize numbers to be a list of numbers from 2 to n, but not including n.
With results starting as the empty list, repeat the following as long as numbers contains any numbers.
Add the first number in numbers to the end of results.
Remove every number in numbers that is evenly divisible by (has no remainder when divided by) the number that you had just added to results.
How long is result?
When n is 100, the length of results is 25.
So far I have understood to set n = 100, and a range(2, 100), results = []
and that the result will be an append situation as in results.append(numbers[]),
but I am having a mental block figuring the key of Remove every number in numbers that is divisible by the number that was added to results.
I know this will be a floor or modulo solution taking from one list to another and working via a while loop. I can also figure the length will be len(results). Any assistance or guidance will be greatly appreciated.

If your new number is newnumber, then you can select only elements from a list which are not divisible by it:
results = [x for x in results if x%newnumber!=0]
results.append(newnumber)
Here newnumber is added afterwards because it is more reasonable to do it (otherwise, it itself would be removed by the filtering).
If you insist on doing it in that order, then it's a bit uglier:
results.append(newnumber)
results = [results[i] for i in range(0,len(results)-1) if results[i]%newnumber!=0]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.