Please consider the below algorithm:
for(j1 = n upto 0)
for(j2 = n-j1 upto 0)
for(j3 = n-j1-j2 upto 0)
.
.
for (jmax = n -j1 - j2 - j_(max-1))
{
count++;
product.append(j1 * j2 ... jmax); // just an example
}
As you can see, some relevant points about the algo snippet above:
I have listed an algorithm with a variable number of for loops.
The result that i calculate at each innermost loop is appended to a list. This list will grow to dimension of 'count'.
Is this problem a suitable candidate for recursion? If yes, i am really not sure how to break the problem up. I am trying to code this up in python, and i do not expect any code from you guys. Just some pointers or examples in the right direction. Thank you.
Here is an initial try for a sample case http://pastebin.com/PiLNTWED
Your algorithm is finding all the m-tuples (m being the max subscript of j from your pseudocode) of non-negative integers that add up to n or less. In Python, the most natural way of expressing that would be with a recursive generator:
def gen_tuples(m, n):
if m == 0:
yield ()
else:
for x in range(n, -1, -1):
for sub_result in gen_tuples(m-1, n-x):
yield (x,)+sub_result
Example output:
>>> for x, y, z in gen_sums(3, 3):
print(x, y, z)
3 0 0
2 1 0
2 0 1
2 0 0
1 2 0
1 1 1
1 1 0
1 0 2
1 0 1
1 0 0
0 3 0
0 2 1
0 2 0
0 1 2
0 1 1
0 1 0
0 0 3
0 0 2
0 0 1
0 0 0
You could also consider using permutations, combinations or product from the itertools module.
If you want all the possible combinations of i, j, k, ... (i.e. nested for loops)
you can use:
for p in product(range(n), repeat=depth):
j1, j2, j3, ... = p # the same as nested for loops
# do stuff here
But beware, the number of iterations in the loop grows exponentially!
the toy example will translate into a kind of tail recursion so, personally, i wouldn't expect a recursive version to be more insightful for code review and maintenance.
however, to get acquainted to the principle, attempt to factor out the invariant parts / common terms from the individual loop and try to identify a pattern (and best prove it afterwards!). you should be able to fix a signature of the recursive procedure to be written. flesh it out with the parts inherent to the loop body/ies (and don't forget the termination condition).
Typically, if you want to transform for loops into recursive calls, you will need to replace the for statements with if statements. For nested loops, you will transform these into function calls.
For practice, start with a dumb translation of the code that works and then attempt to see where you can optimize later.
To give you an idea to try to apply to your situation, I would translate something like this:
results = []
for i in range(n):
results.append(do_stuff(i, n))
to something like this:
results = []
def loop(n, results, i=0):
if i >= n:
return results
results.append(do_stuff(i, n))
i += 1
loop(n, results, i)
there are different ways to handle returning the results list, but you can adapt to your needs.
-- As a response to the excellent listing by Blckgnht -- Consider here the case of n = 2 and max = 3
def simpletest():
'''
I am going to just test the algo listing with assumption
degree n = 2
max = dim(m_p(n-1)) = 3,
so j1 j2 and upto j3 are required for every entry into m_p(degree2)
Lets just print j1,j2,j3 to verify if the function
works in other general version where the number of for loops is not known
'''
n = 2
count = 0
for j1 in range(n, -1, -1):
for j2 in range(n -j1, -1, -1):
j3 = (n-(j1+j2))
count = count + 1
print 'To calculate m_p(%d)[%d], j1,j2,j3 = ' %(n,count), j1, j2, j3
assert(count==6) # just a checkpoint. See P.169 for a proof
print 'No. of entries =', count
The output of this code (and it is correct).
In [54]: %run _myCode/Python/invariant_hack.py
To calculate m_p(2)[1], j1,j2,j3 = 2 0 0
To calculate m_p(2)[2], j1,j2,j3 = 1 1 0
To calculate m_p(2)[3], j1,j2,j3 = 1 0 1
To calculate m_p(2)[4], j1,j2,j3 = 0 2 0
To calculate m_p(2)[5], j1,j2,j3 = 0 1 1
To calculate m_p(2)[6], j1,j2,j3 = 0 0 2
No. of entries = 6
Related
this are the iterations
i = 0
s_array[i].append(f_array[i][i])
s_array[i].append(f_array[i+1][i])
s_array[i].append(f_array[i+2][i])
s_array[i+1].append(f_array[i][i+1])
s_array[i+1].append(f_array[i+1][i+1])
s_array[i+2].append(f_array[i][i+2])
I want to convert this iterations into for loop
for example like this
for i in range(something):
for j in range(something):
s_array[i].append(f_array[j][i])
I tried many trial and errors, but didn't got any solution
The equivalent iterations:
for i in range(3):
for j in range(3 - i):
s_array[i].append(f_array[j][i])
For example:
for i in range(3):
for j in range(3 - i):
print(i,"-->", j, i)
print("")
Output:
0 --> 0 0
0 --> 1 0
0 --> 2 0
1 --> 0 1
1 --> 1 1
2 --> 0 2
Since you are trying to append values to an array using loops, you could try nested for loops as you have indicated. Also, since you are appending fewer values as the iterations continue, you could implement a negative step value for one of the range() functions in the loops so that you iterate fewer times.
Try doing something like this:
for i in range(3):
for j in range(3-i):
s_array[i].append(f_array[j][i])
Hopefully, this should solve your problem.
My goal is to speed up the creation of a list of combinations by using my GPU. How can I accomplish this?
By way of example, the following code creates a list of 260 text strings ranging from "aa" through "jz". We then use itertools combinations_with_replacement() to create all possible combinations of R elements of this list. The use of timeit shows that, beyond 3 elements, extracting a list of these combinations slows exponentially. I suspect this can be done with numba cuda, but I don't know how.
import timeit
timeit.timeit('''
from itertools import combinations_with_replacement
combo_count = 2
alphabet = 'a'
alpha_list = []
item_list = []
for i in range(0,26):
alpha_list.append(alphabet)
alphabet = chr(ord(alphabet) + 1)
for first_letter in alpha_list[0:10]:
for second_letter in alpha_list:
item_list.append(first_letter+second_letter)
print("Length of item list:",len(item_list))
combos = combinations_with_replacement(item_list,combo_count)
cmb_lst = [bla for bla in combos]
print("Length of list of all {} combinations: {}".format(combo_count,len(cmb_lst)))
''', number=1)
As mentioned in the comments, there is no way to "vectorize" the combinations_with_replacement() call from the itertools library directly (with Numba CUDA). Numba CUDA doesn't work that way.
However, I believe it should be possible to generate an equivalent result dataset, using Numba CUDA, in a way that seems to run faster than the itertools library function for certain cases. I imagine there are probably a number of ways to accomplish this, and I make no claims that the method I describe is in any way optimal. It certainly is not, and could certainly be made to run faster. However according to my testing, even this not-very-optimized approach can run a particular test case about 10x faster than python itertools or so on a V100 GPU.
As background, I consider this and this (or equivalent material) to be essential reading.
From the above, the formula for the number of combinations of n items with k choices, with replacement, is given by:
(n-1+k)!
-------- (Formula 1)
(n-1)!k!
In the code below, I have encapsulated the above calculation in count_comb_with_repl (device) and host_count_comb_with_repl (host) functions. It turns out we can use this one basic calculation, with a cascading-smaller sequence of choices for n and k, to drive the entire calculation process to compute a combination given only an index into the final result array. To visualize what we are doing, it helps to have a simple example. Let's take the case of 3 items, and 3 choices. Indexing items from zero, the array of possibilities looks like this:
n = 3, k = 3
index choices first digit calculation
0 0,0,0 -----------------
1 0,0,1
2 0,0,2
3 0,1,1 equivalent to n = 3, k = 2
4 0,1,2
5 0,2,2 -----------------
6 1,1,1 -----------------
7 1,1,2 equivalent to n = 2, k = 2
8 1,2,2 -----------------
9 2,2,2 equivalent to n = 1, k = 2
The length of the above list is given by plugging the values of n = 3 and k = 3 into formula 1. The key observation to understanding the method I present is that to compute the first digit of the choices result given only the index, we can compute the dividing points between 0, and 1 for example by observing that considering the results where the first choice index is 0, the length of this range is given by plugging the values of n = 3 and k = 2 into formula 1. Therefore if our given index is less than this value (6) then we know the first digit is 0. If it is greater than this value then we know the first digit is 1 or 2, and with suitable offsetting we can recompute the next range (corresponding to first digit of 1) and see if our index falls within this range.
Once we know the first digit, we can repeat the process (with suitable list reduction and offsetting) to find the next digit, and the next digit, etc.
Here is a python code that implements the above method. As I mentioned, for a test case of n=260 and k=4 this runs in less than 3 seconds on my V100.
$ cat t2.py
from numba import cuda,jit
import numpy as np
#cuda.jit(device=True)
def get_next_count_comb_with_repl(n,k,prev):
return int(round((prev*(n))/(n+k)))
#cuda.jit(device=True)
def count_comb_with_repl(n,k):
mymax = max(n-1,k)
ans = 1.0
cnt = 1
for i in range(mymax+1, n+k):
ans = ans*i/cnt
cnt += 1
return int(round(ans))
#intended to be identical to the previous function
#I just need a version I can call from host code
def host_count_comb_with_repl(n,k):
mymax = max(n-1,k)
ans = 1.0
cnt = 1
for i in range(mymax+1, n+k):
ans = ans*i/cnt
cnt += 1
return int(round(ans))
#cuda.jit(device=True)
def find_first_digit(n,k,i):
psum = 0
count = count_comb_with_repl(n, k-1)
if (i-psum) < count:
return 0,psum
psum += count
for j in range(1,n):
count = get_next_count_comb_with_repl(n-j,k-1,count)
if (i-psum) < count:
return j,psum
psum += count
return -1,0 # error
#cuda.jit
def kernel_count_comb_with_repl(n, k, l, r):
for i in range(cuda.grid(1), l, cuda.gridsize(1)):
new_ll = n
new_cc = k
new_i = i
new_digit = 0
for j in range(k):
digit,psum = find_first_digit(new_ll, new_cc, new_i)
new_digit += digit
new_ll -= digit
new_cc -= 1
new_i -= psum
r[i+j*l] = new_digit
combo_count = 4
ll = 260
cl = host_count_comb_with_repl(ll, combo_count)
print(cl)
# bug if cl > 2G
if cl < 2**31:
my_dtype = np.uint8
if ll > 255:
my_dtype = np.uint16
r = np.empty(cl*combo_count, dtype=my_dtype)
d_r = cuda.device_array_like(r)
block = 256
grid = (cl//block)+1
#grid = 640
kernel_count_comb_with_repl[grid,block](ll, combo_count, cl, d_r)
r = d_r.copy_to_host()
print(r.reshape(combo_count,cl))
$ time python t2.py
194831715
[[ 0 0 0 ... 258 258 259]
[ 0 0 0 ... 258 259 259]
[ 0 0 0 ... 259 259 259]
[ 0 1 2 ... 259 259 259]]
real 0m2.212s
user 0m1.110s
sys 0m1.077s
$
(The above test case: n=260, k = 4, takes ~30s on my system using OP's code.)
This should be considered to be a sketch of an idea. I make no claims that it is defect free. This type of problem can quickly exhaust the memory on a GPU (for large enough choices of n and/or k), and your only indication of that would probably be a crude out of memory error from numba.
Yes, the above code does not produce concatenations of aa through jz but this is just an indexing exercise using the result. You would use the result indices to index into your array of items, as needed to convert a result like 0,0,0,1 to a result like aa,aa,aa,ab
This isn't a performance win across the board. They python method is still faster for smaller test cases, and larger test cases (e.g. n = 260, k = 5) will exceed available memory on the GPU.
I would like to have three values increment at different speeds. My overall goal is to emulate this pattern:
0,0, 0
0,1, 1
0,2, 2
1,0, 3
1,1, 4
1,2, 5
2,0, 6
2,1, 7
2,2, 8
The first two numbers are easy. I would solve it like this:
for x in range(3):
for y in range(3):
print(x, y)
>>> 0 0
>>> 0 1
>>> 0 2
>>> 1 0
>>> 1 1
>>> 1 2
>>> 2 0
>>> 2 1
>>> 2 2
This is the pattern that I want.
The question is how do I increment the third number by one each time, while still having the first two numbers increment in the way that they are?
Basically, how can I make the third number increment by one each time the for loop goes?
Since we have all these answers, I will post the most straightforward one
count = 0
for x in range(3):
for y in range(3):
print(x, y, count)
count += 1
You don't need nested loops for this. You can use itertools.product to get your first two numbers, and enumerate to get your last one.
from itertools import product
for i, (u, v) in enumerate(product(range(3), repeat=2)):
print(u, v, i)
output
0 0 0
0 1 1
0 2 2
1 0 3
1 1 4
1 2 5
2 0 6
2 1 7
2 2 8
itertools.product is a very handy function. It basically performs nested loops efficiently, but its main benefit is that they don't look nested, so you don't end up with massively indented code. However, its real strength comes when you don't know how many nested loops you need until runtime.
enumerate is probably even more useful: it lets you iterate over a sequence or any iterable and returns the iterable's items as well as an index number. So whenever you need to loop over a list but you need the list items and their indices as well, it's more efficient to use enumerate to get them both at once, rather than having a loop that uses range to produce the index and then using the index to fetch the list item.
The third number counts how many total iterations you had so far. For each increment in X it gains the total size of Y's loop, and to that you need to add the value of Y:
X_SIZE = 3
Y_SIZE = 3
for x in range(X_SIZE):
for y in range(Y_SIZE):
print(x, y, x * Y_SIZE + y)
single variable. single loop.
for i in range(9):
print(i // 3, i % 3, i)
// is floor division and % is modulus (the remainder, in most cases)
Personally, I like this solution because it plainly explains the underlying pattern, and can therefore be easily altered or extended to other patterns.
Try this:
for x in range(3):
for y in range(3):
print(x, y, x * 3 + y) # Python 3.x
print x, y, x * 3 + y # Python 2.x
Hope this helps.
You can use simply a count variable for this
count = 0
for x in range(3):
for y in range(3):
print(x, y, ' ' ,count) # use ' ' for making exact look what OP asked..lol
count = count + 1
This looks more natural to me :)
x_range = 3
y_range = 3
for x in range( x_range*y_range ):
print(x // x_range, x % x_range, x)
Similar to what cwharris wrote.
Assume I have the following matrix (defined here in Julia language):
mat = [1 1 0 0 0 ; 1 1 0 0 0 ; 0 0 0 0 1 ; 0 0 0 1 1]
Considering as a "component" a group of neighbour elements that have value '1', how to identify that this matrix has 2 components and which vertices compose each one?
For the matrix mat above I would like to find the following result:
Component 1 is composed by the following elements of the matrix (row,column):
(1,1)
(1,2)
(2,1)
(2,2)
Component 2 is composed by the following elements:
(3,5)
(4,4)
(4,5)
I can use Graph algorithms like this to identify components in square matrices. However such algorithms can not be used for non-square matrices like the one I present here.
Any idea will be much appreciated.
I am open if your suggestion involves the use of a Python library + PyCall. Although I would prefer to use a pure Julia solution.
Regards
Using Image.jl's label_components is indeed the easiest way to solve the core problem. However, your loop over 1:maximum(labels) may not be efficient: it's O(N*n), where N is the number of elements in labels and n the maximum, because you visit each element of labels n times.
You'd be much better off just visiting each element of labels just twice: once to determine the maximum, and once to assign each non-zero element to its proper group:
using Images
function collect_groups(labels)
groups = [Int[] for i = 1:maximum(labels)]
for (i,l) in enumerate(labels)
if l != 0
push!(groups[l], i)
end
end
groups
end
mat = [1 1 0 0 0 ; 1 1 0 0 0 ; 0 0 0 0 1 ; 0 0 0 1 1]
labels = label_components(mat)
groups = collect_groups(labels)
Output on your test matrix:
2-element Array{Array{Int64,1},1}:
[1,2,5,6]
[16,19,20]
Calling library functions like find can occasionally be useful, but it's also a habit from slower languages that's worth leaving behind. In julia, you can write your own loops and they will be fast; better yet, often the resulting algorithm is much easier to understand. collect(zip(ind2sub(size(mat),find( x -> x == value, mat))...)) does not exactly roll off the tongue.
The answer is pretty simple (though i can't provide python code):
collect all 1s into a list
select an arbitrary element of the list generated in step1 and use an arbitrary graph-traversal algorithm to traverse all neighbored 1s and remove visited 1s from the list generated in step 1
repeat step2 until the list generated in step 1 is empty
In pseudocode (using BFS):
//generate a list with the position of all 1s in the matrix
list pos
for int x in [0 , matrix_width[
for int y in [0 , matrix_height[
if matrix[x][y] == 1
add(pos , {x , y})
while NOT isempty(pos)
//traverse the graph using BFS
list visited
list next
add(next , remove(pos , 0))
while NOT isempty(next)
pair p = remove(next , 0)
add(visited , p)
remove(pos , p)
//p is part of the specific graph that is processed in this BFS
//each repetition of the outer while-loop process a different graph that is part
//of the matrix
addall(next , distinct(visited , neighbour1s(p)))
Just got an answer from julia-users mailing list that solves this problem using Images.jl, a library to work with images in Julia.
They developed a function called "label_components" to identify connected components in matrices.
Then I use a customized function called "findMat" to get the indices of such matrix of components for each component.
The answer, in Julia language:
using Images
function findMat(mat,value)
return(collect(zip(ind2sub(size(mat),find( x -> x == value, mat))...)));
end
mat = [1 1 0 0 0 ; 1 1 0 0 0 ; 0 0 0 0 1 ; 0 0 0 1 1]
labels = label_components(mat);
for c in 1:maximum(labels)
comp = findMat(labels,c);
println("Component $c is composed by the following elements (row,col)");
println("$comp\n");
end
How to optimize this edit distance code i.e. finding the number of bits changed between 2 values! e.g. word1 = '010000001000011111101000001001000110001'
word2 = '010000001000011111101000001011111111111'
When i tried to run on Hadoop it takes ages to complete?
How to reduce the for loop and comparsions ?
#!/usr/bin/python
import os, re, string, sys
from numpy import zeros
def calculateDistance(word1, word2):
x = zeros( (len(word1)+1, len(word2)+1) )
for i in range(0,len(word1)+1):
x[i,0] = i
for i in range(0,len(word2)+1):
x[0,i] = i
for j in range(1,len(word2)+1):
for i in range(1,len(word1)+1):
if word1[i-1] == word2[j-1]:
x[i,j] = x[i-1,j-1]
else:
minimum = x[i-1, j] + 1
if minimum > x[i, j-1] + 1:
minimum = x[i, j-1] + 1
if minimum > x[i-1, j-1] + 1:
minimum = x[i-1, j-1] + 1
x[i,j] = minimum
return x[len(word1), len(word2)]
I looked for a bit counting algorithm online, and I found this page, which has several good algorithms. My favorite there is a one-line function which claims to work for Python 2.6 / 3.0:
return sum( b == '1' for b in bin(word1 ^ word2)[2:] )
I don't have Python, so I can't test, but if this one doesn't work, try one of the others. The key is to count the number of 1's in the bitwise XOR of your two words, because there will be a 1 for each difference.
You are calculating the Hamming distance, right?
EDIT: I'm trying to understand your algorithm, and the way you're manipulating the inputs, it looks like they are actually arrays, and not just binary numbers. So I would expect that your code should look more like:
return sum( a != b for a, b in zip(word1, word2) )
EDIT2: I've figured out what your code does, and it's not the Hamming distance at all! It's actually the Levenshtein distance, which counts how many additions, deletions, or substitutions are needed to turn one string into another (the Hamming distance only counts substitutions, and so is only suitable for equal length strings of digits). Looking at the Wikipedia page, your algorithm is more or less a straight port of the pseudocode they have there. As they point out, the time and space complexity of a comparison of strings of length m and n is O(mn), which is pretty bad. They have a few suggestions of optimizations depending on your needs, but I don't know what you use this function for, so I can't say what would be best for you. If the Hamming distance is good enough for you, the code above should suffice (time complexity O(n)), but it gives different results on some sets of strings, even if they are of equal length, like '0101010101' and '1010101010', which have Hamming distance 10 (flip all bits) and Levenshtein distance 2 (remove the first 0 and add it at the end)
Since you haven't specified what edit distance you're using yet, I'm gonna go on a limb and assume it's Levenshtein distance. In which case, you can shave off some operations here and there:
def levenshtein(a,b):
"Calculates the Levenshtein distance between a and b."
n, m = len(a), len(b)
if n > m:
# Make sure n <= m, to use O(min(n,m)) space.
# Not really important to the algorithm anyway.
a,b = b,a
n,m = m,n
current = range(n+1)
for i in range(1,m+1):
previous, current = current, [i]+[0]*n
for j in range(1,n+1):
add, delete = previous[j]+1, current[j-1]+1
change = previous[j-1]
if a[j-1] != b[i-1]:
change = change + 1
current[j] = min(add, delete, change)
return current[n]
edit: also, you make no mention of your dataset. According to its characteristics, the implementation might change to benefit from it.
Your algorithm seems to do a lot of work. It compares every bit to all bits in the opposite bit vector, meaning you get an algorithmic complexity of O(m*n). That is unnecessary if you are computing Hamming distance, so I assume you're not.
Your loop builds an x[i,j] matrix looking like this:
0 1 0 0 0 0 0 0 1 0 0 ... (word1)
0 0 1 0 0 0 0 0 0 1
1 1 0 1 1 1 1 1 1 0
0 0 1 0 1 1 1 1 1 1
0 0 1 1 0 1 1 1 1 2
0 0 1 1 1 0 1 1 1 2
0 0 1 1 1 1 0 1 1 2
1
1
...
(example word2)
This may be useful for detecting certain types of edits, but without knowing what edit distance algorithm you are trying to implement, I really can't tell you how to optimize it.