memoization in python, off by one errors - python

I'm currently taking an algorithms class. I'm testing a lot of them out in python, including dynamic programming. Here is an implementation of the bottom up rod cutting implementation.
It doesn't work because of the off-by-one error. Is there a global setting in python where I can change the default array index to be 1 and not 0? Or can someone please provide me with a better strategy for over-coming the off-by-one errors, which I encounter a million times. It's super annoying.
def bottom_up_memo_cut_rod(p,n):
r = [ 0 for i in range(n) ]
r[0] = 0
for j in range(n):
q = -1
for i in range(j):
q = max(q, p[i] + r[j-i])
r[j] = q
return r[n]
bottom_up_memo_cut_rod([1,5,8,9], 4)
answer should be 10 in this case cutting 4 into (2,2) yields the max price of 10.

There are a couple of things in Python that may help you. The built-in enumerate is a great one.
for idx, val_at_idx in enumerate(aList):
# idx is the 0-indexed position, val_at_idx is the actual value.
You can also use list slicing with enumerate to shift indices if absolutely necessary:
for idxOffBy1, val_at_wrong_idx in enumerate(aList[1:]):
# idx here will be 0, but the value will be be from position 1 in the original list.
Realistically though, you don't want to try to change the interpreter so that lists start at index 1. You want to adjust your algorithm to work with the language.

In Python, you can often avoid working with the indices altogether. That algorithm can be written like this:
def bottom_up_memo_cut_rod(p,n):
r = [0]
for dummy in p:
r.append(max(a + b for a, b in zip(reversed(r),p)))
return r[-1]
print bottom_up_memo_cut_rod([1,5,8,9], 4)
#10

In your case, off-by-one is a result of r[n] where len(r)==n. You either write r[n-1], or, more preferably, r[-1], which means "the last element of r", the same way r[-2] will mean "second last" etc.
Unrelated, but useful: [ 0 for i in range(n) ] can be written as [0] * n

Related

Python list first n entries in a custom base number system

I am sorry if the title is a misnomer and/or doesn't properly describe what this is all about, you are welcome to edit the title to make it clear once you understand what this is about.
The thing is very simple, but I find it hard to describe, this thing is sorta like a number system, except it is about lists of integers.
So we start with a list of integers with only zero, foreach iteration we add one to it, until a certain limit is reached, then we insert 1 at the start of the list, and set the second element to 0, then iterate over the second element until the limit is reached again, then we add 1 to the first element and set the second element 0, and when the first element reaches the limit, insert another element with value of 1 to the start of the list, and zero the two elements after it, et cetera.
And just like this, when a place reaches limit, zero the place and the places after it, increase the place before it by one, and when all available places reach limit, add 1 to the left, for example:
0
1
2
1, 0
1, 1
1, 2
2, 0
2, 1
2, 2
1, 0, 0
The limit doesn't have to be three.
This is what I currently have that does something similar to this:
array = []
for c in range(26):
for b in range(26):
for a in range(26):
array.append((c, b, a))
I don't want leading zeroes but I can remove them, but I can't figure out how to do this with a variable number of elements.
What I want is a function that takes two arguments, limit (or base) and number of tuples to be returned, and returns the first n such tuples in order.
This must be very simple, but I just can't figure it out, and Google returns completely irrelevant results, so I am asking for help here.
How can this be done? Any help will truly be appreciated!
Hmm, I was thinking about something like this, but very unfortunately I can't make it work, please help me figure out why it doesn't work and how to make it work:
array = []
numbers = [0]
for i in range(1000):
numbers[-1] += 1
while 26 in numbers:
index = numbers.index(26)
numbers[index:] = [0] * (len(numbers) - index)
if index != 0:
numbers[index - 1] += 1
else:
numbers.insert(0, 1)
array.append(numbers)
I don't quite understand it, my testing shows everything inside the loop work perfectly fine outside the loop, the results are correct, but it just simply magically will not work in a loop, I don't know the reason for this, it is very strange.
I discovered the fact that if I change the last line to print(numbers) then everything prints correctly, but if I use append only the last element will be added, how so?
from math import log
def number_to_base(n,base):
number=[]
for digit in range(int(log(n+0.500001,base)),-1,-1):
number.append(n//base**digit%base)
return number
def first_numbers_in_base(n,base):
numbers=[]
for i in range(n):
numbers.append(tuple(number_to_base(i,base)))
return numbers
#tests:
print(first_numbers_in_base(10,3))
print(number_to_base(1048,10))
print(number_to_base(int("10201122110212",3),3))
print(first_numbers_in_base(25,10))
I finally did it!
The logic is very simple, but the hard part is to figure out why it won't work in a loop, turns out I need to use .copy(), because for whatever reason, doing an in-place modification to a list directly modifies the data reside in its memory space, such behavior modifies the same memory space, and .append() method always appends the latest data in a memory space.
So here is the code:
def steps(base, num):
array = []
numbers = [0]
for i in range(num):
copy = numbers.copy()
copy[-1] += 1
while base in copy:
index = copy.index(base)
copy[index:] = [0] * (len(copy) - index)
if index != 0:
copy[index - 1] += 1
else:
copy.insert(0, 1)
array.append(copy)
numbers = copy
return array
Use it like this:
steps(26, 1000)
For the first 1000 lists in base 26.
Here is a a function, that will satisfy original requirements (returns list of tuples, first tuple represents 0) and is faster than other functions that have been posted to this thread:
def first_numbers_in_base(n,base):
if n<2:
if n:
return [(0,)]
return []
numbers=[(0,),(1,)]
base-=1
l=-1
num=[1]
for i in range(n-2):
if num[-1]==base:
num[-1]=0
for i in range(l,-1,-1):
if num[i]==base:
num[i]=0
else:
num[i]+=1
break
else:
num=[1]+num
l+=1
else:
num[-1]+=1
numbers.append(tuple(num))#replace tuple(num) with num.copy() if you want resutl to contain lists instead of tuples.
return numbers

Check if differences between elements already exists in a list

I'm trying to build a heuristic for the simplest feasible Golomb Ruler as possible. From 0 to n, find n numbers such that all the differences between them are different. This heuristic consists of incrementing the ruler by 1 every time. If a difference already exists on a list, jump to the next integer. So the ruler starts with [0,1] and the list of differences = [ 1 ]. Then we try to add 2 to the ruler [0,1,2], but it's not feasible, since the difference (2-1 = 1) already exists in the list of differences. Then we try to add 3 to the ruler [0,1,3] and it is feasible, and thus the list of differences becomes [1,2,3] and so on. Here's what I've come to so far:
n = 5
positions = list(range(1,n+1))
Pos = []
Dist = []
difs = []
i = 0
while (i < len(positions)):
if len(Pos)==0:
Pos.append(0)
Dist.append(0)
elif len(Pos)==1:
Pos.append(1)
Dist.append(1)
else:
postest = Pos + [i] #check feasibility to enter the ruler
difs = [a-b for a in postest for b in postest if a > b]
if any (d in difs for d in Dist)==True:
pass
else:
for d in difs:
Dist.append(d)
Pos.append(i)
i += 1
However I can't make the differences check to work. Any suggestions?
For efficiency I would tend to use a set to store the differences, because they are good for inclusion testing, and you don't care about the ordering (possibly until you actually print them out, at which point you can use sorted).
You can use a temporary set to store the differences between the number that you are testing and the numbers you currently have, and then either add it to the existing set, or else discard it if you find any matches. (Note else block on for loop, that will execute if break was not encountered.)
n = 5
i = 0
vals = []
diffs = set()
while len(vals) < n:
diffs1 = set()
for j in reversed(vals):
diff = i - j
if diff in diffs:
break
diffs1.add(diff)
else:
vals.append(i)
diffs.update(diffs1)
i += 1
print(vals, sorted(diffs))
The explicit loop over values (rather than the use of any) is to avoid unnecessarily calculating the differences between the candidate number and all the existing values, when most candidate numbers are not successful and the loop can be aborted early after finding the first match.
It would work for vals also to be a set and use add instead of append (although similarly, you would probably want to use sorted when printing it). In this case a list is used, and although it does not matter in principle in which order you iterate over it, this code is iterating in reverse order to test the smaller differences first, because the likelihood is that unusable candidates are rejected more quickly this way. Testing it with n=200, the code ran in about 0.2 seconds with reversed and about 2.1 without reversed; the effect is progressively more noticeable as n increases. With n=400, it took 1.7 versus 27 seconds with and without the reversed.

Troubles with matrix in python

Yesterday, I was trying to solve a problem with python, and I encountered something really odd:
# create matrix
for i in range(N):
tmp.append(0)
for i in range(N):
marker.append(tmp)
# create sum of 2 first cards
for i in range(N) :
for j in range(N):
if i != j and marker[i][j] == 0:
comCard.append(deck[i]+deck[j])
taken[deck[i]+deck[j]] = [i,j]
marker[i][j] = 1
marker[j][i] = 1
The idea is that I want to calculate all the possible sums of each pair of cards in the deck (these cards need to be different), so I think that with a marker, I can avoid calculating the same 2 cards again. for example: deck[1]+deck[2] and deck[2]+deck[1]. But these lines didn't work as they were supposed to do:
marker[i][j] = 1
marker[j][i] = 1
I can recommend another way using standard python modules:
# suppose this is a deck - list of cards (I don't know how to mark them :)
>>> deck = ['Ax', 'Dy', '8p', '6a']
>>> from itertools import combinations
# here's a list of all possible combinations of 2 different cards
>>> list(combinations(deck, 2)))
[('Ax', 'Dy'), ('Ax', '8p'), ('Ax', '6a'), ('Dy', '8p'), ('Dy', '6a'), ('8p', '6a')]
You may work with this list: check some combinations and so on.
I recommend to pay attention to library itertools - it's really awesome for such type computations!
You're using the same instance of tmp, only shallow-copied. That's why it doesn't work. You need a new copy of each line to add in your matrix.
This can be done with:
marker.append(list(tmp))
Also, you may benefit from using True and False instead of 0 and 1 someday. So the initialisation of your matrix could rahter look like this:
tmp = list()
marker = list()
for i in range(N):
tmp.append(False)
for j in range(N):
marker.append(list(tmp))
This way, when you try marker[i][j] = True, only the index (i, j) will be affected.
That being said, Eugene Lisitsky's answer gives you a far more adapted tool for this kind of matter (permutation listing).

Random contiguous slice of list in Python based on a single random integer

Using a single random number and a list, how would you return a random slice of that list?
For example, given the list [0,1,2] there are seven possibilities of random contiguous slices:
[ ]
[ 0 ]
[ 0, 1 ]
[ 0, 1, 2 ]
[ 1 ]
[ 1, 2]
[ 2 ]
Rather than getting a random starting index and a random end index, there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
I need it that way, to ensure these 7 possibilities have equal probability.
Simply fix one order in which you would sort all possible slices, then work out a way to turn an index in that list of all slices back into the slice endpoints. For example, the order you used could be described by
The empty slice is before all other slices
Non-empty slices are ordered by their starting point
Slices with the same starting point are ordered by their endpoint
So the index 0 should return the empty list. Indices 1 through n should return [0:1] through [0:n]. Indices n+1 through n+(n-1)=2n-1 would be [1:2] through [1:n]; 2n through n+(n-1)+(n-2)=3n-3 would be [2:3] through [2:n] and so on. You see a pattern here: the last index for a given starting point is of the form n+(n-1)+(n-2)+(n-3)+…+(n-k), where k is the starting index of the sequence. That's an arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that for k, you get some formula involving square roots. You could then use the ceiling function to turn that into an integral value for k corresponding to the last index for that starting point. And once you know k, computing the end point is rather easy.
But the quadratic equation in the solution above makes things really ugly. So you might be better off using some other order. Right now I can't think of a way which would avoid such a quadratic term. The order Douglas used in his answer doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical and is often easier to handle since it doesn't depend on n. But since most people think about normal (forward) lexicographical order first, this answer might be more intuitive to many and might even be the required way for some applications.
Here is a bit of Python code which lists all sequence elements in order, and does the conversion from index i to endpoints [k:m] the way I described above:
from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
b = 1 - 2*n
c = 2*(i - n) - 1
# solve k^2 + b*k + c = 0
k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
The - 1 term in c doesn't come from the mathematical formula I presented above. It's more like subtracting 0.5 from each value of i. This ensures that even if the result of sqrt is slightly too large, you won't end up with a k which is too large. So that term accounts for numeric imprecision and should make the whole thing pretty robust.
The term k*(2*n-k+1)//2 is the last index belonging to starting point k-1, so i minus that term is the length of the subsequence under consideration.
You can simplify things further. You can perform some computation outside the loop, which might be important if you have to choose random sequences repeatedly. You can divide b by a factor of 2 and then get rid of that factor in a number of other places. The result could look like this:
from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
k = int(ceil(b - sqrt(bbc - 2*i)))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
It is a little strange to give the empty list equal weight with the others. It is more natural for the empty list to be given weight 0 or n+1 times the others, if there are n elements on the list. But if you want it to have equal weight, you can do that.
There are n*(n+1)/2 nonempty contiguous sublists. You can specify these by the end point, from 0 to n-1, and the starting point, from 0 to the endpoint.
Generate a random integer x from 0 to n*(n+1)/2.
If x=0, return the empty list. Otherwise, x is unformly distributed from 1 through n(n+1)/2.
Compute e = floor(sqrt(2*x)-1/2). This takes the values 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, etc.
Compute s = (x-1) - e*(e+1)/2. This takes the values 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...
Return the interval starting at index s and ending at index e.
(s,e) takes the values (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),...
import random
import math
n=10
x = random.randint(0,n*(n+1)/2)
if (x==0):
print(range(n)[0:0]) // empty set
exit()
e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))
print(range(n)[s:e+1]) // starting at s, ending at e, inclusive
First create all possible slice indexes.
[0:0], [1:1], etc are equivalent, so we include only one of those.
Finally you pick a random index couple, and apply it.
import random
l = [0, 1, 2]
combination_couples = [(0, 0)]
length = len(l)
# Creates all index couples.
for j in range(1, length+1):
for i in range(j):
combination_couples.append((i, j))
print(combination_couples)
rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]
print(final_slice)
To ensure we got them all:
for i in combination_couples:
print(l[i[0]:i[1]])
Alternatively, with some math...
For a length-3 list there are 0 to 3 possible index numbers, that is n=4. You have 2 of them, that is k=2. First index has to be smaller than second, therefor we need to calculate the combinations as described here.
from math import factorial as f
def total_combinations(n, k=2):
result = 1
for i in range(1, k+1):
result *= n - k + i
result /= f(k)
# We add plus 1 since we included [0:0] as well.
return result + 1
print(total_combinations(n=4)) # Prints 7 as expected.
there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
It is difficult to say what method is best but if you're only interested in binding single random number to your contiguous slice you can use modulo.
Given a list l and a single random nubmer r you can get your contiguous slice like that:
l[r % len(l) : some_sparkling_transformation(r) % len(l)]
where some_sparkling_transformation(r) is essential. It depents on your needs but since I don't see any special requirements in your question it could be for example:
l[r % len(l) : (2 * r) % len(l)]
The most important thing here is that both left and right edges of the slice are correlated to r. This makes a problem to define such contiguous slices that wont follow any observable pattern. Above example (with 2 * r) produces slices that are always empty lists or follow a pattern of [a : 2 * a].
Let's use some intuition. We know that we want to find a good random representation of the number r in a form of contiguous slice. It cames out that we need to find two numbers: a and b that are respectively left and right edges of the slice. Assuming that r is a good random number (we like it in some way) we can say that a = r % len(l) is a good approach.
Let's now try to find b. The best way to generate another nice random number will be to use random number generator (random or numpy) which supports seeding (both of them). Example with random module:
import random
def contiguous_slice(l, r):
random.seed(r)
a = int(random.uniform(0, len(l)+1))
b = int(random.uniform(0, len(l)+1))
a, b = sorted([a, b])
return l[a:b]
Good luck and have fun!

Best way to compare two large sets of strings in Python

I am using Python (and have access to pandas, numpy, scipy).
I have two sets strings set A and set B. Each set A and B contains c. 2000 elements (each element being a string). The strings are around 50-100 characters long comprising up to c. 20 words (these sets may get much larger).
I wish to check if an member of set A is also a member of set B.
Now I am thinking a naive implementation can be visualised as a matrix where members in A and B are compared to one another (e.g. A1 == B1, A1 == B2, A1 == B3 and so on...) and the booleans (0, 1) from the comparison comprise the elements of the matrix.
What is the best way to implement this efficiently?
Two further elaborations:
(i) I am also thinking that for larger sets I may use a Bloom Filter (e.g. using PyBloom, pybloomfilter) to hash each string (i.e. I dont mind fasle positives so much...). Is this a good approach or are there other strategies I should consider?
(ii) I am thinking of including a Levenshtein distance match between strings (which I know can be slow) as I may need fuzzy matches - is there a way of combining this with the approach in (i) or otherwise making it more efficient?
Thanks in advance for any help!
Firstly, 2000 * 100 chars is'nt that big, you could use a set directly.
Secondly, if your strings are sorted, there is a quick way (which I found here)to compare them, as follows:
def compare(E1, E2):
i, j = 0, 0
I, J = len(E1), len(E2)
while i < I:
if j >= J or E1[i] < E2[j]:
print(E1[i], "is not in E2")
i += 1
elif E1[i] == E2[j]:
print(E1[i], "is in E2")
i, j = i + 1, j + 1
else:
j += 1
It is certainly slower than using a set, but it doesn't need the strings to be hold into memory (only two are needed at the same time).
For the Levenshtein thing, there is a C module which you can find on Pypi, and which is quite fast.
As mentioned in the comments:
def compare(A, B):
return list(set(A).intersection(B))
This is a modified version of the function that #michaelmeyer presented here https://stackoverflow.com/a/17264117/362951 - in his answer to the question on top of the page we are on.
The modified version below works also on unsorted data, because the function now includes the sorting.
This should not be a performance or resource problem in many cases, because python sorting is very effective. And presorting also helps.
Please note that the 'output' is now in sorted order too. This will differ from the original order of the first parameter, if it was unsorted.
Otherwise the sorting won't hurt much, even if both data sets are already sorted.
But if you want to suppress the sorting, in case both data sets are known to be sorted in ascending order already, call it like this:
compare(my_data1,my_data2,data_is_sorted=True)
Otherwise:
compare(my_data1,my_data2)
and the function accepts unordered data.
This is the modified version. Only the first two lines were added and a third optional parameter:
def compare(E1, E2, data_is_sorted=False):
if not data_is_sorted:
E1=sorted(E1)
E2=sorted(E2)
i, j = 0, 0
I, J = len(E1), len(E2)
while i < I:
if j >= J or E1[i] < E2[j]:
print(E1[i], "is not in E2")
i += 1
elif E1[i] == E2[j]:
print(E1[i], "is in E2")
i, j = i + 1, j + 1
else:
j += 1

Categories

Resources