Strange behaviour of simple pycuda kernel

Strange behaviour of simple pycuda kernel - python

I'm quite new to cuda and pycuda.
I need a kernel that creates a matrix (of dimension n x d) out of an array (1 x d), by simply "repeating" the same array n times:
for example, suppose we have n = 4 and d = 3, then if the array is [1 2 3]
the result of my kernel should be:
[1 2 3
1 2 3
1 2 3
1 2 3]
(a matrix 4x3).
Basically, it's the same as doing numpy.tile(array, (n, 1))
I've written the code below:
kernel_code_template = """
__global__ void TileKernel(float *in, float *out)
{
// Each thread computes one element of out
int y = blockIdx.y * blockDim.y + threadIdx.y;
int x = blockIdx.x * blockDim.x + threadIdx.x;
if (y > %(n)s || x > %(d)s) return;
out[y * %(d)s + x] = in[x];
}
"""
d = 64
n = 512
blockSizex = 16
blockSizey = 16
gridSizex = (d + blockSizex - 1) / blockSizex
gridSizey = (n + blockSizey - 1) / blockSizey
# get the kernel code from the template
kernel_code = kernel_code_template % {
'd': d,
'n': n
}
mod = SourceModule(kernel_code)
TileKernel = mod.get_function("TileKernel")
vec_cpu = np.arange(d).astype(np.float32) # just as an example
vec_gpu = gpuarray.to_gpu(vec_cpu)
out_gpu = gpuarray.empty((n, d), np.float32)
TileKernel.prepare("PP")
TileKernel.prepared_call((gridSizex, gridSizey), (blockSizex, blockSizey, 1), vec_gpu.gpudata, out_gpu.gpudata)
out_cpu = out_gpu.get()
Now, if I run this code with d equals a power of 2 >= 16 I get the right result (just like numpy.tile(vec_cpu, (n, 1)) );
but if I set d equals to anything else (let's say for example 88) I get that every element of the output matrix has the
correct value, except the first column: some entries are right but others have another value, apparently random, same for every wrong element, but different every run,
and also the entries of the first column that have the wrong value are different every run.
Example:
[0 1 2
0 1 2
6 1 2
0 1 2
6 1 2
...]
I really can't figure out what is causing this problem, but maybe it's just something simple that I'm missing...
Any help will be appreciated, thanks in advance!

The bounds checking within your kernel code is incorrect. This
if (y > n || x > d) return;
out[y * d + x] = in[x];
should be:
if (y >= n || x >= d) return;
out[y * d + x] = in[x];
or better still:
if ((y < n) && (x < d))
out[y * d + x] = in[x];
All array valid indexing in the array lies on 0 < x < d and 0 < y < n. By allowing x=d you have undefined behaviour, allowing the first entry in the next row of the output array to be overwritten with an unknown value. This explains why sometimes the results were correct and other times not.

Related

How do I optimise this function that generates pythagorean group of n elements (like triples but with any number of elements) using itertools? [duplicate]

This is a program I wrote to calculate Pythagorean triplets. When I run the program it prints each set of triplets twice because of the if statement. Is there any way I can tell the program to only print a new set of triplets once? Thanks.
import math
def main():
for x in range (1, 1000):
for y in range (1, 1000):
for z in range(1, 1000):
if x*x == y*y + z*z:
print y, z, x
print '-'*50
if __name__ == '__main__':
main()

Pythagorean Triples make a good example for claiming "for loops considered harmful", because for loops seduce us into thinking about counting, often the most irrelevant part of a task.
(I'm going to stick with pseudo-code to avoid language biases, and to keep the pseudo-code streamlined, I'll not optimize away multiple calculations of e.g. x * x and y * y.)
Version 1:
for x in 1..N {
for y in 1..N {
for z in 1..N {
if x * x + y * y == z * z then {
// use x, y, z
}
}
}
}
is the worst solution. It generates duplicates, and traverses parts of the space that aren't useful (e.g. whenever z < y). Its time complexity is cubic on N.
Version 2, the first improvement, comes from requiring x < y < z to hold, as in:
for x in 1..N {
for y in x+1..N {
for z in y+1..N {
if x * x + y * y == z * z then {
// use x, y, z
}
}
}
}
which reduces run time and eliminates duplicated solutions. However, it is still cubic on N; the improvement is just a reduction of the co-efficient of N-cubed.
It is pointless to continue examining increasing values of z after z * z < x * x + y * y no longer holds. That fact motivates Version 3, the first step away from brute-force iteration over z:
for x in 1..N {
for y in x+1..N {
z = y + 1
while z * z < x * x + y * y {
z = z + 1
}
if z * z == x * x + y * y and z <= N then {
// use x, y, z
}
}
}
For N of 1000, this is about 5 times faster than Version 2, but it is still cubic on N.
The next insight is that x and y are the only independent variables; z depends on their values, and the last z value considered for the previous value of y is a good starting search value for the next value of y. That leads to Version 4:
for x in 1..N {
y = x+1
z = y+1
while z <= N {
while z * z < x * x + y * y {
z = z + 1
}
if z * z == x * x + y * y and z <= N then {
// use x, y, z
}
y = y + 1
}
}
which allows y and z to "sweep" the values above x only once. Not only is it over 100 times faster for N of 1000, it is quadratic on N, so the speedup increases as N grows.
I've encountered this kind of improvement often enough to be mistrustful of "counting loops" for any but the most trivial uses (e.g. traversing an array).
Update: Apparently I should have pointed out a few things about V4 that are easy to overlook.
Both of the while loops are controlled by the value of z (one directly, the other indirectly through the square of z). The inner while is actually speeding up the outer while, rather than being orthogonal to it. It's important to look at what the loops are doing, not merely to count how many loops there are.
All of the calculations in V4 are strictly integer arithmetic. Conversion to/from floating-point, as well as floating-point calculations, are costly by comparison.
V4 runs in constant memory, requiring only three integer variables. There are no arrays or hash tables to allocate and initialize (and, potentially, to cause an out-of-memory error).
The original question allowed all of x, y, and x to vary over the same range. V1..V4 followed that pattern.
Below is a not-very-scientific set of timings (using Java under Eclipse on my older laptop with other stuff running...), where the "use x, y, z" was implemented by instantiating a Triple object with the three values and putting it in an ArrayList. (For these runs, N was set to 10,000, which produced 12,471 triples in each case.)
Version 4: 46 sec.
using square root: 134 sec.
array and map: 400 sec.
The "array and map" algorithm is essentially:
squares = array of i*i for i in 1 .. N
roots = map of i*i -> i for i in 1 .. N
for x in 1 .. N
for y in x+1 .. N
z = roots[squares[x] + squares[y]]
if z exists use x, y, z
The "using square root" algorithm is essentially:
for x in 1 .. N
for y in x+1 .. N
z = (int) sqrt(x * x + y * y)
if z * z == x * x + y * y then use x, y, z
The actual code for V4 is:
public Collection<Triple> byBetterWhileLoop() {
Collection<Triple> result = new ArrayList<Triple>(limit);
for (int x = 1; x < limit; ++x) {
int xx = x * x;
int y = x + 1;
int z = y + 1;
while (z <= limit) {
int zz = xx + y * y;
while (z * z < zz) {++z;}
if (z * z == zz && z <= limit) {
result.add(new Triple(x, y, z));
}
++y;
}
}
return result;
}
Note that x * x is calculated in the outer loop (although I didn't bother to cache z * z); similar optimizations are done in the other variations.
I'll be glad to provide the Java source code on request for the other variations I timed, in case I've mis-implemented anything.

Substantially faster than any of the solutions so far. Finds triplets via a ternary tree.
Wolfram says:
Hall (1970) and Roberts (1977) prove that (a, b, c) is a primitive Pythagorean triple if and only if
(a,b,c)=(3,4,5)M
where M is a finite product of the matrices U, A, D.
And there we have a formula to generate every primitive triple.
In the above formula, the hypotenuse is ever growing so it's pretty easy to check for a max length.
In Python:
import numpy as np
def gen_prim_pyth_trips(limit=None):
u = np.mat(' 1 2 2; -2 -1 -2; 2 2 3')
a = np.mat(' 1 2 2; 2 1 2; 2 2 3')
d = np.mat('-1 -2 -2; 2 1 2; 2 2 3')
uad = np.array([u, a, d])
m = np.array([3, 4, 5])
while m.size:
m = m.reshape(-1, 3)
if limit:
m = m[m[:, 2] <= limit]
yield from m
m = np.dot(m, uad)
If you'd like all triples and not just the primitives:
def gen_all_pyth_trips(limit):
for prim in gen_prim_pyth_trips(limit):
i = prim
for _ in range(limit//prim[2]):
yield i
i = i + prim
list(gen_prim_pyth_trips(10**4)) took 2.81 milliseconds to come back with 1593 elements while list(gen_all_pyth_trips(10**4)) took 19.8 milliseconds to come back with 12471 elements.
For reference, the accepted answer (in Python) took 38 seconds for 12471 elements.
Just for fun, setting the upper limit to one million list(gen_all_pyth_trips(10**6)) returns in 2.66 seconds with 1980642 elements (almost 2 million triples in 3 seconds). list(gen_all_pyth_trips(10**7)) brings my computer to its knees as the list gets so large it consumes every last bit of RAM. Doing something like sum(1 for _ in gen_all_pyth_trips(10**7)) gets around that limitation and returns in 30 seconds with 23471475 elements.
For more information on the algorithm used, check out the articles on Wolfram and Wikipedia.

You should define x < y < z.
for x in range (1, 1000):
for y in range (x + 1, 1000):
for z in range(y + 1, 1000):
Another good optimization would be to only use x and y and calculate zsqr = x * x + y * y. If zsqr is a square number (or z = sqrt(zsqr) is a whole number), it is a triplet, else not. That way, you need only two loops instead of three (for your example, that's about 1000 times faster).

The previously listed algorithms for generating Pythagorean triplets are all modifications of the naive approach derived from the basic relationship a^2 + b^2 = c^2 where (a, b, c) is a triplet of positive integers. It turns out that Pythagorean triplets satisfy some fairly remarkable relationships that can be used to generate all Pythagorean triplets.
Euclid discovered the first such relationship. He determined that for every Pythagorean triple (a, b, c), possibly after a reordering of a and b there are relatively prime positive integers m and n with m > n, at least one of which is even, and a positive integer k such that
a = k (2mn)
b = k (m^2 - n^2)
c = k (m^2 + n^2)
Then to generate Pythagorean triplets, generate relatively prime positive integers m and n of differing parity, and a positive integer k and apply the above formula.
struct PythagoreanTriple {
public int a { get; private set; }
public int b { get; private set; }
public int c { get; private set; }
public PythagoreanTriple(int a, int b, int c) : this() {
this.a = a < b ? a : b;
this.b = b < a ? a : b;
this.c = c;
}
public override string ToString() {
return String.Format("a = {0}, b = {1}, c = {2}", a, b, c);
}
public static IEnumerable<PythagoreanTriple> GenerateTriples(int max) {
var triples = new List<PythagoreanTriple>();
for (int m = 1; m <= max / 2; m++) {
for (int n = 1 + (m % 2); n < m; n += 2) {
if (m.IsRelativelyPrimeTo(n)) {
for (int k = 1; k <= max / (m * m + n * n); k++) {
triples.Add(EuclidTriple(m, n, k));
}
}
}
}
return triples;
}
private static PythagoreanTriple EuclidTriple(int m, int n, int k) {
int msquared = m * m;
int nsquared = n * n;
return new PythagoreanTriple(k * 2 * m * n, k * (msquared - nsquared), k * (msquared + nsquared));
}
}
public static class IntegerExtensions {
private static int GreatestCommonDivisor(int m, int n) {
return (n == 0 ? m : GreatestCommonDivisor(n, m % n));
}
public static bool IsRelativelyPrimeTo(this int m, int n) {
return GreatestCommonDivisor(m, n) == 1;
}
}
class Program {
static void Main(string[] args) {
PythagoreanTriple.GenerateTriples(1000).ToList().ForEach(t => Console.WriteLine(t));
}
}
The Wikipedia article on Formulas for generating Pythagorean triples contains other such formulae.

Algorithms can be tuned for speed, memory usage, simplicity, and other things.
Here is a pythagore_triplets algorithm tuned for speed, at the cost of memory usage and simplicity. If all you want is speed, this could be the way to go.
Calculation of list(pythagore_triplets(10000)) takes 40 seconds on my computer, versus 63 seconds for ΤΖΩΤΖΙΟΥ's algorithm, and possibly days of calculation for Tafkas's algorithm (and all other algorithms which use 3 embedded loops instead of just 2).
def pythagore_triplets(n=1000):
maxn=int(n*(2**0.5))+1 # max int whose square may be the sum of two squares
squares=[x*x for x in xrange(maxn+1)] # calculate all the squares once
reverse_squares=dict([(squares[i],i) for i in xrange(maxn+1)]) # x*x=>x
for x in xrange(1,n):
x2 = squares[x]
for y in xrange(x,n+1):
y2 = squares[y]
z = reverse_squares.get(x2+y2)
if z != None:
yield x,y,z
>>> print list(pythagore_triplets(20))
[(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]
Note that if you are going to calculate the first billion triplets, then this algorithm will crash before it even starts, because of an out of memory error. So ΤΖΩΤΖΙΟΥ's algorithm is probably a safer choice for high values of n.
BTW, here is Tafkas's algorithm, translated into python for the purpose of my performance tests. Its flaw is to require 3 loops instead of 2.
def gcd(a, b):
while b != 0:
t = b
b = a%b
a = t
return a
def find_triple(upper_boundary=1000):
for c in xrange(5,upper_boundary+1):
for b in xrange(4,c):
for a in xrange(3,b):
if (a*a + b*b == c*c and gcd(a,b) == 1):
yield a,b,c

def pyth_triplets(n=1000):
"Version 1"
for x in xrange(1, n):
x2= x*x # time saver
for y in xrange(x+1, n): # y > x
z2= x2 + y*y
zs= int(z2**.5)
if zs*zs == z2:
yield x, y, zs
>>> print list(pyth_triplets(20))
[(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]
V.1 algorithm has monotonically increasing x values.
EDIT
It seems this question is still alive :)
Since I came back and revisited the code, I tried a second approach which is almost 4 times as fast (about 26% of CPU time for N=10000) as my previous suggestion since it avoids lots of unnecessary calculations:
def pyth_triplets(n=1000):
"Version 2"
for z in xrange(5, n+1):
z2= z*z # time saver
x= x2= 1
y= z - 1; y2= y*y
while x < y:
x2_y2= x2 + y2
if x2_y2 == z2:
yield x, y, z
x+= 1; x2= x*x
y-= 1; y2= y*y
elif x2_y2 < z2:
x+= 1; x2= x*x
else:
y-= 1; y2= y*y
>>> print list(pyth_triplets(20))
[(3, 4, 5), (6, 8, 10), (5, 12, 13), (9, 12, 15), (8, 15, 17), (12, 16, 20)]
Note that this algorithm has increasing z values.
If the algorithm was converted to C —where, being closer to the metal, multiplications take more time than additions— one could minimalise the necessary multiplications, given the fact that the step between consecutive squares is:
(x+1)² - x² = (x+1)(x+1) - x² = x² + 2x + 1 - x² = 2x + 1
so all of the inner x2= x*x and y2= y*y would be converted to additions and subtractions like this:
def pyth_triplets(n=1000):
"Version 3"
for z in xrange(5, n+1):
z2= z*z # time saver
x= x2= 1; xstep= 3
y= z - 1; y2= y*y; ystep= 2*y - 1
while x < y:
x2_y2= x2 + y2
if x2_y2 == z2:
yield x, y, z
x+= 1; x2+= xstep; xstep+= 2
y-= 1; y2-= ystep; ystep-= 2
elif x2_y2 < z2:
x+= 1; x2+= xstep; xstep+= 2
else:
y-= 1; y2-= ystep; ystep-= 2
Of course, in Python the extra bytecode produced actually slows down the algorithm compared to version 2, but I would bet (without checking :) that V.3 is faster in C.
Cheers everyone :)

I juste extended Kyle Gullion 's answer so that triples are sorted by hypothenuse, then longest side.
It doesn't use numpy, but requires a SortedCollection (or SortedList) such as this one
def primitive_triples():
""" generates primitive Pythagorean triplets x<y<z
sorted by hypotenuse z, then longest side y
through Berggren's matrices and breadth first traversal of ternary tree
:see: https://en.wikipedia.org/wiki/Tree_of_primitive_Pythagorean_triples
"""
key=lambda x:(x[2],x[1])
triples=SortedCollection(key=key)
triples.insert([3,4,5])
A = [[ 1,-2, 2], [ 2,-1, 2], [ 2,-2, 3]]
B = [[ 1, 2, 2], [ 2, 1, 2], [ 2, 2, 3]]
C = [[-1, 2, 2], [-2, 1, 2], [-2, 2, 3]]
while triples:
(a,b,c) = triples.pop(0)
yield (a,b,c)
# expand this triple to 3 new triples using Berggren's matrices
for X in [A,B,C]:
triple=[sum(x*y for (x,y) in zip([a,b,c],X[i])) for i in range(3)]
if triple[0]>triple[1]: # ensure x<y<z
triple[0],triple[1]=triple[1],triple[0]
triples.insert(triple)
def triples():
""" generates all Pythagorean triplets triplets x<y<z
sorted by hypotenuse z, then longest side y
"""
prim=[] #list of primitive triples up to now
key=lambda x:(x[2],x[1])
samez=SortedCollection(key=key) # temp triplets with same z
buffer=SortedCollection(key=key) # temp for triplets with smaller z
for pt in primitive_triples():
z=pt[2]
if samez and z!=samez[0][2]: #flush samez
while samez:
yield samez.pop(0)
samez.insert(pt)
#build buffer of smaller multiples of the primitives already found
for i,pm in enumerate(prim):
p,m=pm[0:2]
while True:
mz=m*p[2]
if mz < z:
buffer.insert(tuple(m*x for x in p))
elif mz == z:
# we need another buffer because next pt might have
# the same z as the previous one, but a smaller y than
# a multiple of a previous pt ...
samez.insert(tuple(m*x for x in p))
else:
break
m+=1
prim[i][1]=m #update multiplier for next loops
while buffer: #flush buffer
yield buffer.pop(0)
prim.append([pt,2]) #add primitive to the list
the code is available in the math2 module of my Python library. It is tested against some series of the OEIS (code here at the bottom), which just enabled me to find a mistake in A121727 :-)

I wrote that program in Ruby and it similar to the python implementation. The important line is:
if x*x == y*y + z*z && gcd(y,z) == 1:
Then you have to implement a method that return the greatest common divisor (gcd) of two given numbers. A very simple example in Ruby again:
def gcd(a, b)
while b != 0
t = b
b = a%b
a = t
end
return a
end
The full Ruby methon to find the triplets would be:
def find_triple(upper_boundary)
(5..upper_boundary).each {|c|
(4..c-1).each {|b|
(3..b-1).each {|a|
if (a*a + b*b == c*c && gcd(a,b) == 1)
puts "#{a} \t #{b} \t #{c}"
end
}
}
}
end

Old Question, but i'll still input my stuff.
There are two general ways to generate unique pythagorean triples. One Is by Scaling, and the other is by using this archaic formula.
What scaling basically does it take a constant n, then multiply a base triple, lets say 3,4,5 by n. So taking n to be 2, we get 6,8,10 our next triple.
Scaling
def pythagoreanScaled(n):
triplelist = []
for x in range(n):
one = 3*x
two = 4*x
three = 5*x
triple = (one,two,three)
triplelist.append(triple)
return triplelist
The formula method uses the fact the if we take a number x, calculate 2m, m^2+1, and m^2-1, those three will always be a pythagorean triplet.
Formula
def pythagoreantriple(n):
triplelist = []
for x in range(2,n):
double = x*2
minus = x**2-1
plus = x**2+1
triple = (double,minus,plus)
triplelist.append(triple)
return triplelist

Yes, there is.
Okay, now you'll want to know why. Why not just constrain it so that z > y? Try
for z in range (y+1, 1000)

from math import sqrt
from itertools import combinations
#Pythagorean triplet - a^2 + b^2 = c^2 for (a,b) <= (1999,1999)
def gen_pyth(n):
if n >= 2000 :
return
ELEM = [ [ i,j,i*i + j*j ] for i , j in list(combinations(range(1, n + 1 ), 2)) if sqrt(i*i + j*j).is_integer() ]
print (*ELEM , sep = "\n")
gen_pyth(200)

for a in range(1,20):
for b in range(1,20):
for c in range(1,20):
if a>b and c and c>b:
if a**2==b**2+c**2:
print("triplets are:",a,b,c)

in python we can store square of all numbers in another list.
then find permutation of pairs of all number given
square them
finally check if any pair sum of square matches the squared list

Version 5 to Joel Neely.
Since X can be max of 'N-2' and Y can be max of 'N-1' for range of 1..N. Since Z max is N and Y max is N-1, X can be max of Sqrt ( N * N - (N-1) * (N-1) ) = Sqrt ( 2 * N - 1 ) and can start from 3.
MaxX = ( 2 * N - 1 ) ** 0.5
for x in 3..MaxX {
y = x+1
z = y+1
m = x*x + y*y
k = z * z
while z <= N {
while k < m {
z = z + 1
k = k + (2*z) - 1
}
if k == m and z <= N then {
// use x, y, z
}
y = y + 1
m = m + (2 * y) - 1
}
}

Just checking, but I've been using the following code to make pythagorean triples. It's very fast (and I've tried some of the examples here, though I kind of learned them and wrote my own and came back and checked here (2 years ago)). I think this code correctly finds all pythagorean triples up to (name your limit) and fairly quickly too. I used C++ to make it.
ullong is unsigned long long and I created a couple of functions to square and root
my root function basically said if square root of given number (after making it whole number (integral)) squared not equal number give then return -1 because it is not rootable.
_square and _root do as expected as of description above, I know of another way to optimize it but I haven't done nor tested that yet.
generate(vector<Triple>& triplist, ullong limit) {
cout<<"Please wait as triples are being generated."<<endl;
register ullong a, b, c;
register Triple trip;
time_t timer = time(0);
for(a = 1; a <= limit; ++a) {
for(b = a + 1; b <= limit; ++b) {
c = _root(_square(a) + _square(b));
if(c != -1 && c <= limit) {
trip.a = a; trip.b = b; trip.c = c;
triplist.push_back(trip);
} else if(c > limit)
break;
}
}
timer = time(0) - timer;
cout<<"Generated "<<triplist.size()<<" in "<<timer<<" seconds."<<endl;
cin.get();
cin.get();
}
Let me know what you all think. It generates all primitive and non-primitive triples according to the teacher I turned it in for. (she tested it up to 100 if I remember correctly).
The results from the v4 supplied by a previous coder here are
Below is a not-very-scientific set of timings (using Java under Eclipse on my older laptop with other stuff running...), where the "use x, y, z" was implemented by instantiating a Triple object with the three values and putting it in an ArrayList. (For these runs, N was set to 10,000, which produced 12,471 triples in each case.)
Version 4: 46 sec.
using square root: 134 sec.
array and map: 400 sec.
The results from mine is
How many triples to generate: 10000
Please wait as triples are being generated.
Generated 12471 in 2 seconds.
That is before I even start optimizing via the compiler. (I remember previously getting 10000 down to 0 seconds with tons of special options and stuff). My code also generates all the triples with 100,000 as the limit of how high side1,2,hyp can go in 3.2 minutes (I think the 1,000,000 limit takes an hour).
I modified the code a bit and got the 10,000 limit down to 1 second (no optimizations). On top of that, with careful thinking, mine could be broken down into chunks and threaded upon given ranges (for example 100,000 divide into 4 equal chunks for 3 cpu's (1 extra to hopefully consume cpu time just in case) with ranges 1 to 25,000 (start at 1 and limit it to 25,000), 25,000 to 50,000 , 50,000 to 75,000, and 75,000 to end. I may do that and see if it speeds it up any (I will have threads premade and not include them in the actual amount of time to execute the triple function. I'd need a more precise timer and a way to concatenate the vectors. I think that if 1 3.4 GHZ cpu with 8 gb ram at it's disposal can do 10,000 as lim in 1 second then 3 cpus should do that in 1/3 a second (and I round to higher second as is atm).

It should be noted that for a, b, and c you don't need to loop all the way to N.
For a, you only have to loop from 1 to int(sqrt(n**2/2))+1, for b, a+1 to int(sqrt(n**2-a**2))+1, and for c from int(sqrt(a**2+b**2) to int(sqrt(a**2+b**2)+2.

# To find all pythagorean triplets in a range
import math
n = int(input('Enter the upper range of limit'))
for i in range(n+1):
for j in range(1, i):
k = math.sqrt(i*i + j*j)
if k % 1 == 0 and k in range(n+1):
print(i,j,int(k))

U have to use Euclid's proof of Pythagorean triplets. Follow below...
U can choose any arbitrary number greater than zero say m,n
According to Euclid the triplet will be a(m*m-n*n), b(2*m*n), c(m*m+n*n)
Now apply this formula to find out the triplets, say our one value of triplet is 6 then, other two? Ok let’s solve...
a(m*m-n*n), b(2*m*n) , c(m*m+n*n)
It is sure that b(2*m*n) is obviously even. So now
(2*m*n)=6 =>(m*n)=3 =>m*n=3*1 =>m=3,n=1
U can take any other value rather than 3 and 1, but those two values should hold the product of two numbers which is 3 (m*n=3)
Now, when m=3 and n=1 Then,
a(m*m-n*n)=(3*3-1*1)=8 , c(m*m-n*n)=(3*3+1*1)=10
6,8,10 is our triplet for value, this our visualization of how generating triplets.
if given number is odd like (9) then slightly modified here, because b(2*m*n)
will never be odd. so, here we have to take
a(m*m-n*n)=7, (m+n)*(m-n)=7*1, So, (m+n)=7, (m-n)=1
Now find m and n from here, then find the other two values.
If u don’t understand it, read it again carefully.
Do code according this, it will generate distinct triplets efficiently.

A non-numpy version of the Hall/Roberts approach is
def pythag3(limit=None, all=False):
"""generate Pythagorean triples which are primitive (default)
or without restriction (when ``all`` is True). The elements
returned in the tuples are sorted with the smallest first.
Examples
========
>>> list(pythag3(20))
[(3, 4, 5), (8, 15, 17), (5, 12, 13)]
>>> list(pythag3(20, True))
[(3, 4, 5), (6, 8, 10), (9, 12, 15), (12, 16, 20), (8, 15, 17), (5, 12, 13)]
"""
if limit and limit < 5:
return
m = [(3,4,5)] # primitives stored here
while m:
x, y, z = m.pop()
if x > y:
x, y = y, x
yield (x, y, z)
if all:
a, b, c = x, y, z
while 1:
c += z
if c > limit:
break
a += x
b += y
yield a, b, c
# new primitives
a = x - 2*y + 2*z, 2*x - y + 2*z, 2*x - 2*y + 3*z
b = x + 2*y + 2*z, 2*x + y + 2*z, 2*x + 2*y + 3*z
c = -x + 2*y + 2*z, -2*x + y + 2*z, -2*x + 2*y + 3*z
for d in (a, b, c):
if d[2] <= limit:
m.append(d)
It's slower than the numpy-coded version but the primitives with largest element less than or equal to 10^6 are generated on my slow machine in about 1.4 seconds. (And the list m never grew beyond 18 elements.)

In c language -
#include<stdio.h>
int main()
{
int n;
printf("How many triplets needed : \n");
scanf("%d\n",&n);
for(int i=1;i<=2000;i++)
{
for(int j=i;j<=2000;j++)
{
for(int k=j;k<=2000;k++)
{
if((j*j+i*i==k*k) && (n>0))
{
printf("%d %d %d\n",i,j,k);
n=n-1;
}
}
}
}
}

You can try this
triplets=[]
for a in range(1,100):
for b in range(1,100):
for c in range(1,100):
if a**2 + b**2==c**2:
i=[a,b,c]
triplets.append(i)
for i in triplets:
i.sort()
if triplets.count(i)>1:
triplets.remove(i)
print(triplets)

How can I implement this point in polygon code in Python?

So, for my Computer Graphics class I was tasked with doing a Polygon Filler, my software renderer is currently being coded in Python. Right now, I want to test this pointInPolygon code I found at: How can I determine whether a 2D Point is within a Polygon? so I can make my own method later on basing myself on that one.
The code is:
int pnpoly(int nvert, float *vertx, float *verty, float testx, float testy)
{
int i, j, c = 0;
for (i = 0, j = nvert-1; i < nvert; j = i++) {
if ( ((verty[i]>testy) != (verty[j]>testy)) &&
(testx < (vertx[j]-vertx[i]) * (testy-verty[i]) / (verty[j]-verty[i]) + vertx[i]) )
c = !c;
}
return c;
}
And my attempt to recreate it in Python is as following:
def pointInPolygon(self, nvert, vertx, verty, testx, testy):
c = 0
j = nvert-1
for i in range(nvert):
if(((verty[i]>testy) != (verty[j]>testy)) and (testx < (vertx[j]-vertx[i]) * (testy-verty[i]) / (verty[j]-verty[i] + vertx[i]))):
c = not c
j += 1
return c
But this obviously will return a index out of range in the second iteration because j = nvert and it will crash.
Thanks in advance.

You're reading the tricky C code incorrectly. The point of j = i++ is to both increment i by one and assign the old value to j. Similar python code would do j = i at the end of the loop:
j = nvert - 1
for i in range(nvert):
...
j = i
The idea is that for nvert == 3, the values would go
j | i
---+---
2 | 0
0 | 1
1 | 2
Another way to achieve this is that j equals (i - 1) % nvert,
for i in range(nvert):
j = (i - 1) % nvert
...
i.e. it is lagging one behind, and the indices form a ring (like the vertices do)
More pythonic code would use itertools and iterate over the coordinates themselves. You'd have a list of pairs (tuples) called vertices, and two iterators, one of which is one vertex ahead the other, and cycling back to the beginning because of itertools.cycle, something like:
# make one iterator that goes one ahead and wraps around at the end
next_ones = itertools.cycle(vertices)
next(next_ones)
for ((x1, y1), (x2, y2)) in zip(vertices, next_ones):
# unchecked...
if (((y1 > testy) != (y2 > testy))
and (testx < (x2 - x1) * (testy - y1) / (y2-y1 + x1))):
c = not c

Converting MATLAB code to Python: Python types and order of operations

This is a MATLAB function from the author of RainbowCrack:
function ret = calc_success_probability(N, t, m)
arr = zeros(1, t - 1);
arr(1) = m;
for i = 2 : t - 1
arr(i) = N * (1 - (1 - 1 / N) ^ arr(i - 1));
end
exp = 0;
for i = 1 : t - 1
exp = exp + arr(i);
end
ret = 1 - (1 - 1 / N) ^ exp;
It calculates the probability of success in finding a plaintext password given a rainbow table with keyspace N, a large unsigned integer, chain of length t, and number of chains m.
A sample run:
calc_success_probability(80603140212, 2400, 40000000)
Returns 0.6055.
I am having difficulty converting this into Python. In Python 3, there is no max integer anymore, so N isn't an issue. I think in the calculations I have to force everything to a large floating point number, but I'm not sure.
I also don't know the order of operations in MATLAB. I think the code is saying this:
Create array of size [1 .. 10] so ten elements
Initialize every element of that array with zero
In zero-based indexing, I think this would be array[0 .. t-1], it looks like MATLAB uses 1 as the first (0'th) index.
Then second element of array (0-based indexing) initialized to m.
For each element in array, pos=1 (0-based indexing) to t-1:
array[pos] = N * (1 - (1 - 1/N) ** array[pos-1]
Where ** is the power operator. I think power is ^ in MATLAB, so N * (1 - (1-1/N) to the array[pos-1] power is like that above.
Then set an exponent. For each element in array 0 to t-1:
exponent is exponent + 1
return probability = 1 - (1 - 1/N) power of exp;
My Python code looks like this, and doesn't work. I can't figure out why, but it could be that I don't understand MATLAB enough, or Python, both, or I'm reading the math wrong somehow and what's going on in MATLAB is not what I'm expecting, i.e. I have order of operations and/or types wrong to make it work and I'm missing something in those terms...
def calc_success_probability(N, t, m):
comp_arr = []
# array with indices 1 to t-1 in MATLAB, which is otherwise 0 to t-2???
# range with 0, t is 0 to t excluding t, so t here is t-1, t-1 is up
# to including t-2... sounds wrong...
for i in range(0, t-1):
# initialize array
comp_arr.append(0)
print("t = {0:d}, array size is {1:d}".format(t, len(comp_arr)))
# zero'th element chain count
comp_arr[0] = m
for i in range(1, t-1):
comp_arr[i] = N * (1 - (1 - 1 / N)) ** comp_arr[i-1]
final_exp = 0
for i in range(0, t-1):
final_exp = final_exp + comp_arr[i]
probability = (1 - (1 - 1 / N)) ** final_exp
return probability

Watch your brackets! You have translated this:
arr(i) = N * ( 1 - ( 1 - 1 / N ) ^ arr(i - 1) );
to this:
comp_arr[i] = N * ( 1 - ( 1 - 1 / N ) ) ** comp_arr[i-1]
I've lined up everything so you can better see where it goes wrong. You've moved a bracket to the wrong location.
It should be:
comp_arr[i] = N * ( 1 - ( 1 - 1 / N ) ** comp_arr[i-1] )
Similarly,
ret = 1 - (1 - 1 / N) ^ exp;
is not the same as
probability = (1 - (1 - 1 / N)) ** final_exp
This should be
probability = 1 - (1 - 1 / N) ** final_exp

Solving Puzzle in Python

I got one puzzle and I want to solve it using Python.
Puzzle:
A merchant has a 40 kg weight which he used in his shop. Once, it fell
from his hands and was broken into 4 pieces. But surprisingly, now he
can weigh any weight between 1 kg to 40 kg with the combination of
these 4 pieces.
So question is, what are weights of those 4 pieces?
Now I wanted to solve this in Python.
The only constraint i got from the puzzle is that sum of 4 pieces is 40. With that I could filter all the set of 4 values whose sum is 40.
import itertools as it
weight = 40
full = range(1,41)
comb = [x for x in it.combinations(full,4) if sum(x)==40]
length of comb = 297
Now I need to check each set of values in comb and try all the combination of operations.
Eg if (a,b,c,d) is the first set of values in comb, I need to check a,b,c,d,a+b,a-b, .................a+b+c-d,a-b+c+d........ and so on.
I tried a lot, but i am stuck at this stage, ie how to check all these combination of calculations to each set of 4 values.
Question :
1) I think i need to get a list all possible combination of [a,b,c,d] and [+,-].
2) does anyone have a better idea and tell me how to go forward from here?
Also, I want to do it completely without help of any external libraries, need to use only standard libraries of python.
EDIT : Sorry for the late info. Its answer is (1,3,9,27), which I found a few years back. I have checked and verified the answer.
EDIT : At present, fraxel's answer works perfect with time = 0.16 ms. A better and faster approach is always welcome.
Regards
ARK

Earlier walk-through anwswer:
We know a*A + b*B + c*C + d*D = x for all x between 0 and 40, and a, b, c, d are confined to -1, 0, 1. Clearly A + B + C + D = 40. The next case is x = 39, so clearly the smallest move is to remove an element (it is the only possible move that could result in successfully balancing against 39):
A + B + C = 39, so D = 1, by neccessity.
next:
A + B + C - D = 38
next:
A + B + D = 37, so C = 3
then:
A + B = 36
then:
A + B - D = 35
A + B - C + D = 34
A + B - C = 33
A + B - C - D = 32
A + C + D = 31, so A = 9
Therefore B = 27
So the weights are 1, 3, 9, 27
Really this can be deduced immediately from the fact that they must all be multiples of 3.
Interesting Update:
So here is some python code to find a minimum set of weights for any dropped weight that will span the space:
def find_weights(W):
weights = []
i = 0
while sum(weights) < W:
weights.append(3 ** i)
i += 1
weights.pop()
weights.append(W - sum(weights))
return weights
print find_weights(40)
#output:
[1, 3, 9, 27]
To further illustrate this explaination, one can consider the problem as the minimum number of weights to span the number space [0, 40]. It is evident that the number of things you can do with each weight is trinary /ternary (add weight, remove weight, put weight on other side). So if we write our (unknown) weights (A, B, C, D) in descending order, our moves can be summarised as:
ABCD: Ternary:
40: ++++ 0000
39: +++0 0001
38: +++- 0002
37: ++0+ 0010
36: ++00 0011
35: ++0- 0012
34: ++-+ 0020
33: ++-0 0021
32: ++-- 0022
31: +0++ 0100
etc.
I have put ternary counting from 0 to 9 alongside, to illustrate that we are effectively in a trinary number system (base 3). Our solution can always be written as:
3**0 + 3**1 +3**2 +...+ 3**N >= Weight
For the minimum N that this holds true. The minimum solution will ALWAYS be of this form.
Furthermore, we can easily solve the problem for large weights and find the minimum number of pieces to span the space:
A man drops a known weight W, it breaks into pieces. His new weights allow him to weigh any weight up to W. How many weights are there, and what are they?
#what if the dropped weight was a million Kg:
print find_weights(1000000)
#output:
[1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683, 59049, 177147, 531441, 202839]
Try using permutations for a large weight and unknown number of pieces!!

Here is a brute-force itertools solution:
import itertools as it
def merchant_puzzle(weight, pieces):
full = range(1, weight+1)
all_nums = set(full)
comb = [x for x in it.combinations(full, pieces) if sum(x)==weight]
funcs = (lambda x: 0, lambda x: x, lambda x: -x)
for c in comb:
sums = set()
for fmap in it.product(funcs, repeat=pieces):
s = sum(f(x) for x, f in zip(c, fmap))
if s > 0:
sums.add(s)
if sums == all_nums:
return c
>>> merchant_puzzle(40, 4)
(1, 3, 9, 27)
For an explanation of how it works, check out the answer Avaris gave, this is an implementation of the same algorithm.

You are close, very close :).
Since this is a puzzle you want to solve, I'll just give pointers. For this part:
Eg if (a,b,c,d) is the first set of values in comb, i need to check
a,b,c,d,a+b,a-b, .................a+b+c-d,a-b+c+d........ and so on.
Consider this: Each weight can be put to one scale, the other or neither. So for the case of a, this can be represented as [a, -a, 0]. Same with the other three. Now you need all possible pairings with these 3 possibilities for each weight (hint: itertools.product). Then, a possible measuring of a pairing (lets say: (a, -b, c, 0)) is merely the sum of these (a-b+c+0).
All that is left is just checking if you could 'measure' all the required weights. set might come handy here.
PS: As it was stated in the comments, for the general case, it might not be necessary that these divided weights should be distinct (for this problem it is). You might reconsider itertools.combinations.

I brute forced the hell out of the second part.
Do not click this if you don't want to see the answer. Obviously, if I was better at permutations, this would have required a lot less cut/paste search/replace:
http://pastebin.com/4y2bHCVr

I don't know Python syntax, but maybe you can decode this Scala code; start with the 2nd for-loop:
def setTo40 (a: Int, b: Int, c: Int, d: Int) = {
val vec = for (
fa <- List (0, 1, -1);
fb <- List (0, 1, -1);
fc <- List (0, 1, -1);
fd <- List (0, 1, -1);
prod = fa * a + fb * b + fc * c + fd * d;
if (prod > 0)
) yield (prod)
vec.toSet
}
for (a <- (1 to 9);
b <- (a to 14);
c <- (b to 20);
d = 40-(a+b+c)
if (d > 0)) {
if (setTo40 (a, b, c, d).size > 39)
println (a + " " + b + " " + c + " " + d)
}

With weights [2, 5, 15, 18] you can also measure all objects between 1 and 40kg, although some of them will need to be measured indirectly. For example, to measure an object weighting 39kg, you would first compare it with 40kg and the balance would pend to the 40kg side (because 39 < 40), but then if you remove the 2kg weight it would pend to the other side (because 39 > 38) and thus you can conclude the object weights 39kg.
More interestingly, with weights [2, 5, 15, 45] you can measure all objects up to 67kg.

If anyone doesn't want to import a library to import combos/perms, this will generate all possible 4-move strategies...
# generates permutations of repeated values
def permutationsWithRepeats(n, v):
perms = []
value = [0] * n
N = n - 1
i = n - 1
while i > -1:
perms.append(list(value))
if value[N] < v:
value[N] += 1
else:
while (i > -1) and (value[i] == v):
value[i] = 0
i -= 1
if i > -1:
value[i] += 1
i = N
return perms
# generates the all possible permutations of 4 ternary moves
def strategy():
move = ['-', '0', '+']
perms = permutationsWithRepeats(4, 2)
for i in range(len(perms)):
s = ''
for j in range(4):
s += move[perms[i][j]]
print s
# execute
strategy()

My solution as follows:
#!/usr/bin/env python3
weight = 40
parts = 4
part=[0] * parts
def test_solution(p, weight,show_result=False):
cv=[0,0,0,0]
for check_weight in range(1,weight+1):
sum_ok = False
for parts_used in range(2 ** parts):
for options in range(2 ** parts):
for pos in range(parts):
pos_neg = int('{0:0{1}b}'.format(options,parts)[pos]) * 2 - 1
use = int('{0:0{1}b}'.format(parts_used,parts)[pos])
cv[pos] = p[pos] * pos_neg * use
if sum(cv) == check_weight:
if show_result:
print("{} = sum of:{}".format(check_weight, cv))
sum_ok = True
break
if sum_ok:
continue
else:
return False
return True
for part[0] in range(1,weight-parts):
for part[1] in range(part[0]+1, weight - part[0]):
for part[2] in range( part[1] + 1 , weight - sum(part[0:2])):
part[3] = weight - sum(part[0:3])
if test_solution(part,weight):
print(part)
test_solution(part,weight,True)
exit()
It gives you all the solutions for the given weights

More dynamic than my previous answer, so it also works with other numbers. But breaking up into 5 peaces takes some time:
#!/usr/bin/env python3
weight = 121
nr_of_parts = 5
# weight = 40
# nr_of_parts = 4
weight = 13
nr_of_parts = 3
part=[0] * nr_of_parts
def test_solution(p, weight,show_result=False):
cv=[0] * nr_of_parts
for check_weight in range(1,weight+1):
sum_ok = False
for nr_of_parts_used in range(2 ** nr_of_parts):
for options in range(2 ** nr_of_parts):
for pos in range(nr_of_parts):
pos_neg = int('{0:0{1}b}'.format(options,nr_of_parts)[pos]) * 2 - 1
use = int('{0:0{1}b}'.format(nr_of_parts_used,nr_of_parts)[pos])
cv[pos] = p[pos] * pos_neg * use
if sum(cv) == check_weight:
if show_result:
print("{} = sum of:{}".format(check_weight, cv))
sum_ok = True
break
if sum_ok:
continue
else:
return False
return True
def set_parts(part,position, nr_of_parts, weight):
if position == 0:
part[position] = 1
part, valid = set_parts(part,position+1,nr_of_parts,weight)
return part, valid
if position == nr_of_parts - 1:
part[position] = weight - sum(part)
if part[position -1] >= part[position]:
return part, False
return part, True
part[position]=max(part[position-1]+1,part[position])
part, valid = set_parts(part, position + 1, nr_of_parts, weight)
if not valid:
part[position]=max(part[position-1]+1,part[position]+1)
part=part[0:position+1] + [0] * (nr_of_parts - position - 1)
part, valid = set_parts(part, position + 1, nr_of_parts, weight)
return part, valid
while True:
part, valid = set_parts(part, 0, nr_of_parts, weight)
if not valid:
print(part)
print ('No solution posible')
exit()
if test_solution(part,weight):
print(part,' ')
test_solution(part,weight,True)
exit()
else:
print(part,' ', end='\r')

Generating unique, ordered Pythagorean triplets

This is a program I wrote to calculate Pythagorean triplets. When I run the program it prints each set of triplets twice because of the if statement. Is there any way I can tell the program to only print a new set of triplets once? Thanks.
import math
def main():
for x in range (1, 1000):
for y in range (1, 1000):
for z in range(1, 1000):
if x*x == y*y + z*z:
print y, z, x
print '-'*50
if __name__ == '__main__':
main()

Pythagorean Triples make a good example for claiming "for loops considered harmful", because for loops seduce us into thinking about counting, often the most irrelevant part of a task.
(I'm going to stick with pseudo-code to avoid language biases, and to keep the pseudo-code streamlined, I'll not optimize away multiple calculations of e.g. x * x and y * y.)
Version 1:
for x in 1..N {
for y in 1..N {
for z in 1..N {
if x * x + y * y == z * z then {
// use x, y, z
}
}
}
}
is the worst solution. It generates duplicates, and traverses parts of the space that aren't useful (e.g. whenever z < y). Its time complexity is cubic on N.
Version 2, the first improvement, comes from requiring x < y < z to hold, as in:
for x in 1..N {
for y in x+1..N {
for z in y+1..N {
if x * x + y * y == z * z then {
// use x, y, z
}
}
}
}
which reduces run time and eliminates duplicated solutions. However, it is still cubic on N; the improvement is just a reduction of the co-efficient of N-cubed.
It is pointless to continue examining increasing values of z after z * z < x * x + y * y no longer holds. That fact motivates Version 3, the first step away from brute-force iteration over z:
for x in 1..N {
for y in x+1..N {
z = y + 1
while z * z < x * x + y * y {
z = z + 1
}
if z * z == x * x + y * y and z <= N then {
// use x, y, z
}
}
}
For N of 1000, this is about 5 times faster than Version 2, but it is still cubic on N.
The next insight is that x and y are the only independent variables; z depends on their values, and the last z value considered for the previous value of y is a good starting search value for the next value of y. That leads to Version 4:
for x in 1..N {
y = x+1
z = y+1
while z <= N {
while z * z < x * x + y * y {
z = z + 1
}
if z * z == x * x + y * y and z <= N then {
// use x, y, z
}
y = y + 1
}
}
which allows y and z to "sweep" the values above x only once. Not only is it over 100 times faster for N of 1000, it is quadratic on N, so the speedup increases as N grows.
I've encountered this kind of improvement often enough to be mistrustful of "counting loops" for any but the most trivial uses (e.g. traversing an array).
Update: Apparently I should have pointed out a few things about V4 that are easy to overlook.
Both of the while loops are controlled by the value of z (one directly, the other indirectly through the square of z). The inner while is actually speeding up the outer while, rather than being orthogonal to it. It's important to look at what the loops are doing, not merely to count how many loops there are.
All of the calculations in V4 are strictly integer arithmetic. Conversion to/from floating-point, as well as floating-point calculations, are costly by comparison.
V4 runs in constant memory, requiring only three integer variables. There are no arrays or hash tables to allocate and initialize (and, potentially, to cause an out-of-memory error).
The original question allowed all of x, y, and x to vary over the same range. V1..V4 followed that pattern.
Below is a not-very-scientific set of timings (using Java under Eclipse on my older laptop with other stuff running...), where the "use x, y, z" was implemented by instantiating a Triple object with the three values and putting it in an ArrayList. (For these runs, N was set to 10,000, which produced 12,471 triples in each case.)
Version 4: 46 sec.
using square root: 134 sec.
array and map: 400 sec.
The "array and map" algorithm is essentially:
squares = array of i*i for i in 1 .. N
roots = map of i*i -> i for i in 1 .. N
for x in 1 .. N
for y in x+1 .. N
z = roots[squares[x] + squares[y]]
if z exists use x, y, z
The "using square root" algorithm is essentially:
for x in 1 .. N
for y in x+1 .. N
z = (int) sqrt(x * x + y * y)
if z * z == x * x + y * y then use x, y, z
The actual code for V4 is:
public Collection<Triple> byBetterWhileLoop() {
Collection<Triple> result = new ArrayList<Triple>(limit);
for (int x = 1; x < limit; ++x) {
int xx = x * x;
int y = x + 1;
int z = y + 1;
while (z <= limit) {
int zz = xx + y * y;
while (z * z < zz) {++z;}
if (z * z == zz && z <= limit) {
result.add(new Triple(x, y, z));
}
++y;
}
}
return result;
}
Note that x * x is calculated in the outer loop (although I didn't bother to cache z * z); similar optimizations are done in the other variations.
I'll be glad to provide the Java source code on request for the other variations I timed, in case I've mis-implemented anything.

Substantially faster than any of the solutions so far. Finds triplets via a ternary tree.
Wolfram says:
Hall (1970) and Roberts (1977) prove that (a, b, c) is a primitive Pythagorean triple if and only if
(a,b,c)=(3,4,5)M
where M is a finite product of the matrices U, A, D.
And there we have a formula to generate every primitive triple.
In the above formula, the hypotenuse is ever growing so it's pretty easy to check for a max length.
In Python:
import numpy as np
def gen_prim_pyth_trips(limit=None):
u = np.mat(' 1 2 2; -2 -1 -2; 2 2 3')
a = np.mat(' 1 2 2; 2 1 2; 2 2 3')
d = np.mat('-1 -2 -2; 2 1 2; 2 2 3')
uad = np.array([u, a, d])
m = np.array([3, 4, 5])
while m.size:
m = m.reshape(-1, 3)
if limit:
m = m[m[:, 2] <= limit]
yield from m
m = np.dot(m, uad)
If you'd like all triples and not just the primitives:
def gen_all_pyth_trips(limit):
for prim in gen_prim_pyth_trips(limit):
i = prim
for _ in range(limit//prim[2]):
yield i
i = i + prim
list(gen_prim_pyth_trips(10**4)) took 2.81 milliseconds to come back with 1593 elements while list(gen_all_pyth_trips(10**4)) took 19.8 milliseconds to come back with 12471 elements.
For reference, the accepted answer (in Python) took 38 seconds for 12471 elements.
Just for fun, setting the upper limit to one million list(gen_all_pyth_trips(10**6)) returns in 2.66 seconds with 1980642 elements (almost 2 million triples in 3 seconds). list(gen_all_pyth_trips(10**7)) brings my computer to its knees as the list gets so large it consumes every last bit of RAM. Doing something like sum(1 for _ in gen_all_pyth_trips(10**7)) gets around that limitation and returns in 30 seconds with 23471475 elements.
For more information on the algorithm used, check out the articles on Wolfram and Wikipedia.

You should define x < y < z.
for x in range (1, 1000):
for y in range (x + 1, 1000):
for z in range(y + 1, 1000):
Another good optimization would be to only use x and y and calculate zsqr = x * x + y * y. If zsqr is a square number (or z = sqrt(zsqr) is a whole number), it is a triplet, else not. That way, you need only two loops instead of three (for your example, that's about 1000 times faster).

The previously listed algorithms for generating Pythagorean triplets are all modifications of the naive approach derived from the basic relationship a^2 + b^2 = c^2 where (a, b, c) is a triplet of positive integers. It turns out that Pythagorean triplets satisfy some fairly remarkable relationships that can be used to generate all Pythagorean triplets.
Euclid discovered the first such relationship. He determined that for every Pythagorean triple (a, b, c), possibly after a reordering of a and b there are relatively prime positive integers m and n with m > n, at least one of which is even, and a positive integer k such that
a = k (2mn)
b = k (m^2 - n^2)
c = k (m^2 + n^2)
Then to generate Pythagorean triplets, generate relatively prime positive integers m and n of differing parity, and a positive integer k and apply the above formula.
struct PythagoreanTriple {
public int a { get; private set; }
public int b { get; private set; }
public int c { get; private set; }
public PythagoreanTriple(int a, int b, int c) : this() {
this.a = a < b ? a : b;
this.b = b < a ? a : b;
this.c = c;
}
public override string ToString() {
return String.Format("a = {0}, b = {1}, c = {2}", a, b, c);
}
public static IEnumerable<PythagoreanTriple> GenerateTriples(int max) {
var triples = new List<PythagoreanTriple>();
for (int m = 1; m <= max / 2; m++) {
for (int n = 1 + (m % 2); n < m; n += 2) {
if (m.IsRelativelyPrimeTo(n)) {
for (int k = 1; k <= max / (m * m + n * n); k++) {
triples.Add(EuclidTriple(m, n, k));
}
}
}
}
return triples;
}
private static PythagoreanTriple EuclidTriple(int m, int n, int k) {
int msquared = m * m;
int nsquared = n * n;
return new PythagoreanTriple(k * 2 * m * n, k * (msquared - nsquared), k * (msquared + nsquared));
}
}
public static class IntegerExtensions {
private static int GreatestCommonDivisor(int m, int n) {
return (n == 0 ? m : GreatestCommonDivisor(n, m % n));
}
public static bool IsRelativelyPrimeTo(this int m, int n) {
return GreatestCommonDivisor(m, n) == 1;
}
}
class Program {
static void Main(string[] args) {
PythagoreanTriple.GenerateTriples(1000).ToList().ForEach(t => Console.WriteLine(t));
}
}
The Wikipedia article on Formulas for generating Pythagorean triples contains other such formulae.

Algorithms can be tuned for speed, memory usage, simplicity, and other things.
Here is a pythagore_triplets algorithm tuned for speed, at the cost of memory usage and simplicity. If all you want is speed, this could be the way to go.
Calculation of list(pythagore_triplets(10000)) takes 40 seconds on my computer, versus 63 seconds for ΤΖΩΤΖΙΟΥ's algorithm, and possibly days of calculation for Tafkas's algorithm (and all other algorithms which use 3 embedded loops instead of just 2).
def pythagore_triplets(n=1000):
maxn=int(n*(2**0.5))+1 # max int whose square may be the sum of two squares
squares=[x*x for x in xrange(maxn+1)] # calculate all the squares once
reverse_squares=dict([(squares[i],i) for i in xrange(maxn+1)]) # x*x=>x
for x in xrange(1,n):
x2 = squares[x]
for y in xrange(x,n+1):
y2 = squares[y]
z = reverse_squares.get(x2+y2)
if z != None:
yield x,y,z
>>> print list(pythagore_triplets(20))
[(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]
Note that if you are going to calculate the first billion triplets, then this algorithm will crash before it even starts, because of an out of memory error. So ΤΖΩΤΖΙΟΥ's algorithm is probably a safer choice for high values of n.
BTW, here is Tafkas's algorithm, translated into python for the purpose of my performance tests. Its flaw is to require 3 loops instead of 2.
def gcd(a, b):
while b != 0:
t = b
b = a%b
a = t
return a
def find_triple(upper_boundary=1000):
for c in xrange(5,upper_boundary+1):
for b in xrange(4,c):
for a in xrange(3,b):
if (a*a + b*b == c*c and gcd(a,b) == 1):
yield a,b,c

def pyth_triplets(n=1000):
"Version 1"
for x in xrange(1, n):
x2= x*x # time saver
for y in xrange(x+1, n): # y > x
z2= x2 + y*y
zs= int(z2**.5)
if zs*zs == z2:
yield x, y, zs
>>> print list(pyth_triplets(20))
[(3, 4, 5), (5, 12, 13), (6, 8, 10), (8, 15, 17), (9, 12, 15), (12, 16, 20)]
V.1 algorithm has monotonically increasing x values.
EDIT
It seems this question is still alive :)
Since I came back and revisited the code, I tried a second approach which is almost 4 times as fast (about 26% of CPU time for N=10000) as my previous suggestion since it avoids lots of unnecessary calculations:
def pyth_triplets(n=1000):
"Version 2"
for z in xrange(5, n+1):
z2= z*z # time saver
x= x2= 1
y= z - 1; y2= y*y
while x < y:
x2_y2= x2 + y2
if x2_y2 == z2:
yield x, y, z
x+= 1; x2= x*x
y-= 1; y2= y*y
elif x2_y2 < z2:
x+= 1; x2= x*x
else:
y-= 1; y2= y*y
>>> print list(pyth_triplets(20))
[(3, 4, 5), (6, 8, 10), (5, 12, 13), (9, 12, 15), (8, 15, 17), (12, 16, 20)]
Note that this algorithm has increasing z values.
If the algorithm was converted to C —where, being closer to the metal, multiplications take more time than additions— one could minimalise the necessary multiplications, given the fact that the step between consecutive squares is:
(x+1)² - x² = (x+1)(x+1) - x² = x² + 2x + 1 - x² = 2x + 1
so all of the inner x2= x*x and y2= y*y would be converted to additions and subtractions like this:
def pyth_triplets(n=1000):
"Version 3"
for z in xrange(5, n+1):
z2= z*z # time saver
x= x2= 1; xstep= 3
y= z - 1; y2= y*y; ystep= 2*y - 1
while x < y:
x2_y2= x2 + y2
if x2_y2 == z2:
yield x, y, z
x+= 1; x2+= xstep; xstep+= 2
y-= 1; y2-= ystep; ystep-= 2
elif x2_y2 < z2:
x+= 1; x2+= xstep; xstep+= 2
else:
y-= 1; y2-= ystep; ystep-= 2
Of course, in Python the extra bytecode produced actually slows down the algorithm compared to version 2, but I would bet (without checking :) that V.3 is faster in C.
Cheers everyone :)

I juste extended Kyle Gullion 's answer so that triples are sorted by hypothenuse, then longest side.
It doesn't use numpy, but requires a SortedCollection (or SortedList) such as this one
def primitive_triples():
""" generates primitive Pythagorean triplets x<y<z
sorted by hypotenuse z, then longest side y
through Berggren's matrices and breadth first traversal of ternary tree
:see: https://en.wikipedia.org/wiki/Tree_of_primitive_Pythagorean_triples
"""
key=lambda x:(x[2],x[1])
triples=SortedCollection(key=key)
triples.insert([3,4,5])
A = [[ 1,-2, 2], [ 2,-1, 2], [ 2,-2, 3]]
B = [[ 1, 2, 2], [ 2, 1, 2], [ 2, 2, 3]]
C = [[-1, 2, 2], [-2, 1, 2], [-2, 2, 3]]
while triples:
(a,b,c) = triples.pop(0)
yield (a,b,c)
# expand this triple to 3 new triples using Berggren's matrices
for X in [A,B,C]:
triple=[sum(x*y for (x,y) in zip([a,b,c],X[i])) for i in range(3)]
if triple[0]>triple[1]: # ensure x<y<z
triple[0],triple[1]=triple[1],triple[0]
triples.insert(triple)
def triples():
""" generates all Pythagorean triplets triplets x<y<z
sorted by hypotenuse z, then longest side y
"""
prim=[] #list of primitive triples up to now
key=lambda x:(x[2],x[1])
samez=SortedCollection(key=key) # temp triplets with same z
buffer=SortedCollection(key=key) # temp for triplets with smaller z
for pt in primitive_triples():
z=pt[2]
if samez and z!=samez[0][2]: #flush samez
while samez:
yield samez.pop(0)
samez.insert(pt)
#build buffer of smaller multiples of the primitives already found
for i,pm in enumerate(prim):
p,m=pm[0:2]
while True:
mz=m*p[2]
if mz < z:
buffer.insert(tuple(m*x for x in p))
elif mz == z:
# we need another buffer because next pt might have
# the same z as the previous one, but a smaller y than
# a multiple of a previous pt ...
samez.insert(tuple(m*x for x in p))
else:
break
m+=1
prim[i][1]=m #update multiplier for next loops
while buffer: #flush buffer
yield buffer.pop(0)
prim.append([pt,2]) #add primitive to the list
the code is available in the math2 module of my Python library. It is tested against some series of the OEIS (code here at the bottom), which just enabled me to find a mistake in A121727 :-)

I wrote that program in Ruby and it similar to the python implementation. The important line is:
if x*x == y*y + z*z && gcd(y,z) == 1:
Then you have to implement a method that return the greatest common divisor (gcd) of two given numbers. A very simple example in Ruby again:
def gcd(a, b)
while b != 0
t = b
b = a%b
a = t
end
return a
end
The full Ruby methon to find the triplets would be:
def find_triple(upper_boundary)
(5..upper_boundary).each {|c|
(4..c-1).each {|b|
(3..b-1).each {|a|
if (a*a + b*b == c*c && gcd(a,b) == 1)
puts "#{a} \t #{b} \t #{c}"
end
}
}
}
end

Old Question, but i'll still input my stuff.
There are two general ways to generate unique pythagorean triples. One Is by Scaling, and the other is by using this archaic formula.
What scaling basically does it take a constant n, then multiply a base triple, lets say 3,4,5 by n. So taking n to be 2, we get 6,8,10 our next triple.
Scaling
def pythagoreanScaled(n):
triplelist = []
for x in range(n):
one = 3*x
two = 4*x
three = 5*x
triple = (one,two,three)
triplelist.append(triple)
return triplelist
The formula method uses the fact the if we take a number x, calculate 2m, m^2+1, and m^2-1, those three will always be a pythagorean triplet.
Formula
def pythagoreantriple(n):
triplelist = []
for x in range(2,n):
double = x*2
minus = x**2-1
plus = x**2+1
triple = (double,minus,plus)
triplelist.append(triple)
return triplelist

Yes, there is.
Okay, now you'll want to know why. Why not just constrain it so that z > y? Try
for z in range (y+1, 1000)

from math import sqrt
from itertools import combinations
#Pythagorean triplet - a^2 + b^2 = c^2 for (a,b) <= (1999,1999)
def gen_pyth(n):
if n >= 2000 :
return
ELEM = [ [ i,j,i*i + j*j ] for i , j in list(combinations(range(1, n + 1 ), 2)) if sqrt(i*i + j*j).is_integer() ]
print (*ELEM , sep = "\n")
gen_pyth(200)

for a in range(1,20):
for b in range(1,20):
for c in range(1,20):
if a>b and c and c>b:
if a**2==b**2+c**2:
print("triplets are:",a,b,c)

in python we can store square of all numbers in another list.
then find permutation of pairs of all number given
square them
finally check if any pair sum of square matches the squared list

Version 5 to Joel Neely.
Since X can be max of 'N-2' and Y can be max of 'N-1' for range of 1..N. Since Z max is N and Y max is N-1, X can be max of Sqrt ( N * N - (N-1) * (N-1) ) = Sqrt ( 2 * N - 1 ) and can start from 3.
MaxX = ( 2 * N - 1 ) ** 0.5
for x in 3..MaxX {
y = x+1
z = y+1
m = x*x + y*y
k = z * z
while z <= N {
while k < m {
z = z + 1
k = k + (2*z) - 1
}
if k == m and z <= N then {
// use x, y, z
}
y = y + 1
m = m + (2 * y) - 1
}
}

Just checking, but I've been using the following code to make pythagorean triples. It's very fast (and I've tried some of the examples here, though I kind of learned them and wrote my own and came back and checked here (2 years ago)). I think this code correctly finds all pythagorean triples up to (name your limit) and fairly quickly too. I used C++ to make it.
ullong is unsigned long long and I created a couple of functions to square and root
my root function basically said if square root of given number (after making it whole number (integral)) squared not equal number give then return -1 because it is not rootable.
_square and _root do as expected as of description above, I know of another way to optimize it but I haven't done nor tested that yet.
generate(vector<Triple>& triplist, ullong limit) {
cout<<"Please wait as triples are being generated."<<endl;
register ullong a, b, c;
register Triple trip;
time_t timer = time(0);
for(a = 1; a <= limit; ++a) {
for(b = a + 1; b <= limit; ++b) {
c = _root(_square(a) + _square(b));
if(c != -1 && c <= limit) {
trip.a = a; trip.b = b; trip.c = c;
triplist.push_back(trip);
} else if(c > limit)
break;
}
}
timer = time(0) - timer;
cout<<"Generated "<<triplist.size()<<" in "<<timer<<" seconds."<<endl;
cin.get();
cin.get();
}
Let me know what you all think. It generates all primitive and non-primitive triples according to the teacher I turned it in for. (she tested it up to 100 if I remember correctly).
The results from the v4 supplied by a previous coder here are
Below is a not-very-scientific set of timings (using Java under Eclipse on my older laptop with other stuff running...), where the "use x, y, z" was implemented by instantiating a Triple object with the three values and putting it in an ArrayList. (For these runs, N was set to 10,000, which produced 12,471 triples in each case.)
Version 4: 46 sec.
using square root: 134 sec.
array and map: 400 sec.
The results from mine is
How many triples to generate: 10000
Please wait as triples are being generated.
Generated 12471 in 2 seconds.
That is before I even start optimizing via the compiler. (I remember previously getting 10000 down to 0 seconds with tons of special options and stuff). My code also generates all the triples with 100,000 as the limit of how high side1,2,hyp can go in 3.2 minutes (I think the 1,000,000 limit takes an hour).
I modified the code a bit and got the 10,000 limit down to 1 second (no optimizations). On top of that, with careful thinking, mine could be broken down into chunks and threaded upon given ranges (for example 100,000 divide into 4 equal chunks for 3 cpu's (1 extra to hopefully consume cpu time just in case) with ranges 1 to 25,000 (start at 1 and limit it to 25,000), 25,000 to 50,000 , 50,000 to 75,000, and 75,000 to end. I may do that and see if it speeds it up any (I will have threads premade and not include them in the actual amount of time to execute the triple function. I'd need a more precise timer and a way to concatenate the vectors. I think that if 1 3.4 GHZ cpu with 8 gb ram at it's disposal can do 10,000 as lim in 1 second then 3 cpus should do that in 1/3 a second (and I round to higher second as is atm).

It should be noted that for a, b, and c you don't need to loop all the way to N.
For a, you only have to loop from 1 to int(sqrt(n**2/2))+1, for b, a+1 to int(sqrt(n**2-a**2))+1, and for c from int(sqrt(a**2+b**2) to int(sqrt(a**2+b**2)+2.

# To find all pythagorean triplets in a range
import math
n = int(input('Enter the upper range of limit'))
for i in range(n+1):
for j in range(1, i):
k = math.sqrt(i*i + j*j)
if k % 1 == 0 and k in range(n+1):
print(i,j,int(k))

U have to use Euclid's proof of Pythagorean triplets. Follow below...
U can choose any arbitrary number greater than zero say m,n
According to Euclid the triplet will be a(m*m-n*n), b(2*m*n), c(m*m+n*n)
Now apply this formula to find out the triplets, say our one value of triplet is 6 then, other two? Ok let’s solve...
a(m*m-n*n), b(2*m*n) , c(m*m+n*n)
It is sure that b(2*m*n) is obviously even. So now
(2*m*n)=6 =>(m*n)=3 =>m*n=3*1 =>m=3,n=1
U can take any other value rather than 3 and 1, but those two values should hold the product of two numbers which is 3 (m*n=3)
Now, when m=3 and n=1 Then,
a(m*m-n*n)=(3*3-1*1)=8 , c(m*m-n*n)=(3*3+1*1)=10
6,8,10 is our triplet for value, this our visualization of how generating triplets.
if given number is odd like (9) then slightly modified here, because b(2*m*n)
will never be odd. so, here we have to take
a(m*m-n*n)=7, (m+n)*(m-n)=7*1, So, (m+n)=7, (m-n)=1
Now find m and n from here, then find the other two values.
If u don’t understand it, read it again carefully.
Do code according this, it will generate distinct triplets efficiently.

A non-numpy version of the Hall/Roberts approach is
def pythag3(limit=None, all=False):
"""generate Pythagorean triples which are primitive (default)
or without restriction (when ``all`` is True). The elements
returned in the tuples are sorted with the smallest first.
Examples
========
>>> list(pythag3(20))
[(3, 4, 5), (8, 15, 17), (5, 12, 13)]
>>> list(pythag3(20, True))
[(3, 4, 5), (6, 8, 10), (9, 12, 15), (12, 16, 20), (8, 15, 17), (5, 12, 13)]
"""
if limit and limit < 5:
return
m = [(3,4,5)] # primitives stored here
while m:
x, y, z = m.pop()
if x > y:
x, y = y, x
yield (x, y, z)
if all:
a, b, c = x, y, z
while 1:
c += z
if c > limit:
break
a += x
b += y
yield a, b, c
# new primitives
a = x - 2*y + 2*z, 2*x - y + 2*z, 2*x - 2*y + 3*z
b = x + 2*y + 2*z, 2*x + y + 2*z, 2*x + 2*y + 3*z
c = -x + 2*y + 2*z, -2*x + y + 2*z, -2*x + 2*y + 3*z
for d in (a, b, c):
if d[2] <= limit:
m.append(d)
It's slower than the numpy-coded version but the primitives with largest element less than or equal to 10^6 are generated on my slow machine in about 1.4 seconds. (And the list m never grew beyond 18 elements.)

In c language -
#include<stdio.h>
int main()
{
int n;
printf("How many triplets needed : \n");
scanf("%d\n",&n);
for(int i=1;i<=2000;i++)
{
for(int j=i;j<=2000;j++)
{
for(int k=j;k<=2000;k++)
{
if((j*j+i*i==k*k) && (n>0))
{
printf("%d %d %d\n",i,j,k);
n=n-1;
}
}
}
}
}

You can try this
triplets=[]
for a in range(1,100):
for b in range(1,100):
for c in range(1,100):
if a**2 + b**2==c**2:
i=[a,b,c]
triplets.append(i)
for i in triplets:
i.sort()
if triplets.count(i)>1:
triplets.remove(i)
print(triplets)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Strange behaviour of simple pycuda kernel - python

Related

How do I optimise this function that generates pythagorean group of n elements (like triples but with any number of elements) using itertools? [duplicate]

How can I implement this point in polygon code in Python?

Converting MATLAB code to Python: Python types and order of operations

Solving Puzzle in Python

Generating unique, ordered Pythagorean triplets

Categories

Resources