Iterations in python consume whole memory - faulty code?

Iterations in python consume whole memory - faulty code? - python

I have written some very simple trial-and-error code in Sage (a computer algebra system written in python where you can use regular python syntax in scripting). The little code snippet creates a polynomial and does some calculations with the coefficients, especially it determines the Groebner basis for the ideal generated by three expressions in the coefficients.
The problem is: This program goes on and eats up all my memory until it's killed by the kernel. Every iteration consumes only like 200kB, but this memory is never freed again.
Here is the code. The details are not that important and very bulky, therefore left out:
R = PolynomialRing(QQ, 2, 'bc', order='lex')
expr1, expr2, expr3 = ...
for i in range (0,50):
for j in range(i+1,50):
for k in range(j+1,50):
for l in range(k+1,50):
for m in range(l+1,50):
for n in range(m+1,50):
poly = (x-i)*(x-j)*(x-k)*(x-l)*(x-m)*(x-n)
r = poly.coeffs()
p1 = expr1.substitute(r...)
p2 = expr2.substitute(r...)
p3 = expr3.substitute(r...)
I = (p1, p2, p3)*R
B = I.groebner_basis()
As far as I understood python's memory management, the variables in the loop body are freed every so often. Now, it may be a programming problem, an internal python problem or some problem in the Sage routines. I don't know. Can you spot a problem with my code or is it something else?

The problem doesn't appear to be your loops (in python2.7, OS-X 10.5.8):
a = 0
for i in range (0,50):
for j in range(i+1,50):
for k in range(j+1,50):
for l in range(k+1,50):
for m in range(l+1,50):
for n in range(m+1,50):
a += 1
print( a )
Which takes very little additional memory on both python2.x and python3.x.
And it really doesn't take all that long to run either:
time python test.py
15890700
real 0m6.015s
user 0m5.940s
sys 0m0.032s
Perhaps something is funky when running with sage? Or maybe it's something else in your loops that is causing the problem...

The origin of the bug might be the call method on multivariate polynomials. Something as innocent as:
` for i in xrange(really_big_number):
polynomial(1,0,0,0)==0 `
will explode.
This might either happen in p1 = expr1.substitute(r...) or well inside the algorithm for the Groebner Basis.

Related

Scipy minimize iterating past bounds

I am trying to minimize a function of 3 input variables using scipy. The function reads like so-
def myfunc(x):
x[0] = a
x[1] = b
x[2] = c
n = f(a,b,c)
return n
bound1 = (80,100)
bound2 = (10,20)
bound3 = (312,740)
guess = [a0,b0,c0]
bds = (bound1,bound2,bound3)
result = minimize(myfunc, guess,method='L-BFGS-B',bounds=bds)
The function I am trying to currently run reaches a minimum at a=100,b=10,c=740, which is at the end of the bounds.
The minimize function keeps trying to iterate past the end of bound 3 (gets to c0 value of 740.0000000149012 on its last iteration.
Is there any way to stop this from happening? i.e. stop the iteration at the actual end of my bound?

This happens due to numerical-differentiation, which itself is not only needed to infer the step-direction and size, but also to reason about termination.
In general you can't do much without being very careful in regards to whatever solver (and there are many backend-solvers) being used. The basic idea is to replace the automatic numerical-differentiation with one provided by you: this one then respects those bounds and must be careful about the solvers-internals, e.g. "how to reason about termination at this end".
Fix A:
Your problem should vanish automatically when using: Pull-request #10673, which touches your configuration: L-BFGS-B.
It seems, this PR is not part of the current release SciPy 1.4.1 (as this was 2 months before the PR).
See also #6026, where a milestone of 1.5.0 is mentioned in regards to some changes including respecting bounds in num-diff.
For above PR, you will need to install scipy from the sources, which is:
quite doable on linux (and maybe os x)
not something you should try on windows!
trust me...
See the documentation if needed.
Fix B:
Apart from that, as you are doing unconstrained-optimization (with variable-bounds) where more solver-backends are available (compared to constrained-optimization), you might try another solver, trust-constr, which has explicit support for this, see #9098.
Be careful to recognize, that you need to signal this explicitly when setting up the bounds!

High memory usage when doing direct transcription with sympy equations

I used sympy to derive, via lagrange, the equations of motion of my 3 link robot. The resultant equation of motion in the form (theta_dot_dot = f(theta, theta_dot)) turned out very complicated with A LOT of cos and sin. I then lambdified the functions to use with drake, replacing all the sympy.sin and sympy.cos with drake.sin, drake.cos.
The final function can be evaluated numerically (i.e. given theta, theta_dot, find theta_dot_dot) somewhat efficiently in the milliseconds range.
I then tried to use direct transcription to do trajectory optimization. Note I did not use the DirectTranscription library, instead manually added the necessary constraints.
The constraints are added roughly as follows:
for i in range(NUM_TIME_STEPS-1):
print("Adding constraints for t = " + str(i))
tau = mp.NewContinuousVariables(3, "tau_%d" % i)
next_state = mp.NewContinuousVariables(8, "state_%d" % (i+1))
for j in range(8):
mp.AddConstraint(next_state[j] <= (state_over_time[i] + TIME_INTERVAL*derivs(state_over_time[i], tau))[j])
mp.AddConstraint(next_state[j] >= (state_over_time[i] + TIME_INTERVAL*derivs(state_over_time[i], tau))[j])
state_over_time[i+1] = next_state
tau_over_time[i] = tau
The problem I'm facing right now is that on each iteration of adding constraints, I observe that my memory usage increases by around 70-100MB. This means that my number of time steps cannot go more than around 50 before the program crashes due to out of memory.
I'm wondering what I can do to make trajectory optimization work for my robot. Obviously I can try to simplify (by hand or otherwise) the equations of motions... but is there anything else I can try? Is it even normal that the constraints are taking up so much memory? Am I doing something very wrong here?

You're pushing drake's symbolic through your complex equations. Making that better is a good goal, but probably you want to avoid it by using the other overload for AddConstraint:
AddConstraint(your_method, lb, ub, vars)
https://drake.mit.edu/pydrake/pydrake.solvers.mathematicalprogram.html?highlight=addconstraint#pydrake.solvers.mathematicalprogram.MathematicalProgram.AddConstraint
That will use your python code as a function pointer, and should use autodiff instead of symbolic.

Can Go really be that much faster than Python?

I think I may have implemented this incorrectly because the results do not make sense. I have a Go program that counts to 1000000000:
package main
import (
"fmt"
)
func main() {
for i := 0; i < 1000000000; i++ {}
fmt.Println("Done")
}
It finishes in less than a second. On the other hand I have a Python script:
x = 0
while x < 1000000000:
x+=1
print 'Done'
It finishes in a few minutes.
Why is the Go version so much faster? Are they both counting up to 1000000000 or am I missing something?

One billion is not a very big number. Any reasonably modern machine should be able to do this in a few seconds at most, if it's able to do the work with native types. I verified this by writing an equivalent C program, reading the assembly to make sure that it actually was doing addition, and timing it (it completes in about 1.8 seconds on my machine).
Python, however, doesn't have a concept of natively typed variables (or meaningful type annotations at all), so it has to do hundreds of times as much work in this case. In short, the answer to your headline question is "yes". Go really can be that much faster than Python, even without any kind of compiler trickery like optimizing away a side-effect-free loop.

pypy actually does an impressive job of speeding up this loop
def main():
x = 0
while x < 1000000000:
x+=1
if __name__ == "__main__":
s=time.time()
main()
print time.time() - s
$ python count.py
44.221405983
$ pypy count.py
1.03511095047
~97% speedup!
Clarification for 3 people who didn't "get it". The Python language itself isn't slow. The CPython implementation is a relatively straight forward way of running the code. Pypy is another implementation of the language that does many tricky (especiallt the JIT) things that can make enormous differences. Directly answering the question in the title - Go isn't "that much" faster than Python, Go is that much faster than CPython.
Having said that, the code samples aren't really doing the same thing. Python needs to instantiate 1000000000 of its int objects. Go is just incrementing one memory location.

This scenario will highly favor decent natively-compiled statically-typed languages. Natively compiled statically-typed languages are capable of emitting a very trivial loop of say, 4-6 CPU opcodes that utilizes simple check-condition for termination. This loop has effectively zero branch prediction misses and can be effectively thought of as performing an increment every CPU cycle (this isn't entirely true, but..)
Python implementations have to do significantly more work, primarily due to the dynamic typing. Python must make several different calls (internal and external) just to add two ints together. In Python it must call __add__ (it is effectively i = i.__add__(1), but this syntax will only work in Python 3.x), which in turn has to check the type of the value passed (to make sure it is an int), then it adds the integer values (extracting them from both of the objects), and then the new integer value is wrapped up again in a new object. Finally it re-assigns the new object to the local variable. That's significantly more work than a single opcode to increment, and doesn't even address the loop itself - by comparison, the Go/native version is likely only incrementing a register by side-effect.
Java will fair much better in a trivial benchmark like this and will likely be fairly close to Go; the JIT and static-typing of the counter variable can ensure this (it uses a special integer add JVM instruction). Once again, Python has no such advantage. Now, there are some implementations like PyPy/RPython, which run a static-typing phase and should fare much better than CPython here ..

You've got two things at work here. The first of which is that Go is compiled to machine code and run directly on the CPU while Python is compiled to bytecode run against a (particularly slow) VM.
The second, and more significant, thing impacting performance is that the semantics of the two programs are actually significantly different. The Go version makes a "box" called "x" that holds a number and increments that by 1 on each pass through the program. The Python version actually has to create a new "box" (int object) on each cycle (and, eventually, has to throw them away). We can demonstrate this by modifying your programs slightly:
package main
import (
"fmt"
)
func main() {
for i := 0; i < 10; i++ {
fmt.Printf("%d %p\n", i, &i)
}
}
...and:
x = 0;
while x < 10:
x += 1
print x, id(x)
This is because Go, due to it's C roots, takes a variable name to refer to a place, where Python takes variable names to refer to things. Since an integer is considered a unique, immutable entity in python, we must constantly make new ones. Python should be slower than Go but you've picked a worst-case scenario - in the Benchmarks Game, we see go being, on average, about 25x times faster (100x in the worst case).
You've probably read that, if your Python programs are too slow, you can speed them up by moving things into C. Fortunately, in this case, somebody's already done this for you. If you rewrite your empty loop to use xrange() like so:
for x in xrange(1000000000):
pass
print "Done."
...you'll see it run about twice as fast. If you find loop counters to actually be a major bottleneck in your program, it might be time to investigate a new way of solving the problem.

#troq
I'm a little late to the party but I'd say the answer is yes and no. As #gnibbler pointed out, CPython is slower in the simple implementation but pypy is jit compiled for much faster code when you need it.
If you're doing numeric processing with CPython most will do it with numpy resulting in fast operations on arrays and matrices. Recently I've been doing a lot with numba which allows you to add a simple wrapper to your code. For this one I just added #njit to a function incALot() which runs your code above.
On my machine CPython takes 61 seconds, but with the numba wrapper it takes 7.2 microseconds which will be similar to C and maybe faster than Go. Thats an 8 million times speedup.
So, in Python, if things with numbers seem a bit slow, there are tools to address it - and you still get Python's programmer productivity and the REPL.
def incALot(y):
x = 0
while x < y:
x += 1
#njit('i8(i8)')
def nbIncALot(y):
x = 0
while x < y:
x += 1
return x
size = 1000000000
start = time.time()
incALot(size)
t1 = time.time() - start
start = time.time()
x = nbIncALot(size)
t2 = time.time() - start
print('CPython3 takes %.3fs, Numba takes %.9fs' %(t1, t2))
print('Speedup is: %.1f' % (t1/t2))
print('Just Checking:', x)
CPython3 takes 58.958s, Numba takes 0.000007153s
Speedup is: 8242982.2
Just Checking: 1000000000

Problem is Python is interpreted, GO isn't so there's no real way to bench test speeds. Interpreted languages usually (not always have a vm component) that's where the problem lies, any test you run is being run in interpreted bounds not actual runtime bounds. Go is slightly slower than C in terms of speed and that is mostly due to it using garbage collection instead of manual memory management. That said GO compared to Python is fast because its a compiled language, the only thing lacking in GO is bug testing I stand corrected if I'm wrong.

It is possible that the compiler realized that you didn't use the "i" variable after the loop, so it optimized the final code by removing the loop.
Even if you used it afterwards, the compiler is probably smart enough to substitute the loop with
i = 1000000000;
Hope this helps =)

I'm not familiar with go, but I'd guess that go version ignores the loop since the body of the loop does nothing. On the other hand, in the python version, you are incrementing x in the body of the loop so it's probably actually executing the loop.

Lattice paths algorithm does not finish running for 20 X 20 grid

I wrote the following code in python to solve
problem 15 from Project Euler:
grid_size = 2
def get_paths(node):
global paths
if node[0] >= grid_size and node[1] >= grid_size:
paths += 1
return
else:
if node[0]<grid_size+1 and node[1] < grid_size+1:
get_paths((node[0]+1,node[1]))
get_paths((node[0],node[1]+1))
return paths
def euler():
print get_paths((0,0))
paths = 0
if __name__ == '__main__':
euler()
Although it runs quite well for a 2 X 2 grid, it's been running for hours for a 20 X 20 grid. How can I optimise the code so that it can run on larger grids?
Is it a kind of breadth first search problem? (It seems so to me.)
How can I measure the complexity of my solution in its current form?

You might want to look into the maths behind this problem. It's not necessary to actually iterate through all routes. (In fact, you'll never make the 1 minute mark like that).
I can post a hint but won't do so unless you ask for it, since I wouldn't want to spoil it for you.
Edit:
Yes, the algorithm you're using will never really be optimal since there's no way to reduce the search space of your problem. This means that (as pg1989 stated) you'll have to look into alternative means of solving this problem.
As sverre said looking over here might give a nudge in the right direction:
http://en.wikipedia.org/wiki/Binomial_coefficient
A direct solution may be found here (warning, big spoiler):
http://www.joaoff.com/2008/01/20/a-square-grid-path-problem/

Your algorithm is exponential, but only because you are re-evaluating get_paths with the same input many times. Adding Memoization to it will make it run in time. Also, you'll need to get rid of the global variable, and use return values instead. See also Dynamic Programming for a similar idea.

When solving problems on Project Euler, think about the math behind the problem for a long time before starting to code. This problem can be solved without any code whatsoever.
We're trying to count the number of ways through a grid. If you observe that the number of moves down and right do not change regardless of the path, then you only need to worry about the order in which you move down and right. So in the 2x2 case, the following combinations work:
DDRR
DRDR
RDRD
RRDD
RDDR
DRRD
Notice that if we pick where we put the R moves, the placement of the D moves is determined. So really we only have to choose, from the 4 movement slots available, which get the R moves. Can you think of a mathematical operation that does this?

Probably not the way the project Euler guys wanted this problem to be solved but the answer is just the central binomial coefficient of a 20x20 grid.
Using the formula provided at the wiki article you get:
from math import factorial, pow
grid = 20
print int(factorial(2 * grid) / pow(factorial(grid), 2))

The key is not to make your algorithm run faster, as it will (potentially) run in exponential time, no matter how fast each step is.
It is probably better to find another way of computing the answer. Using your (expensive, but correct) solution as a comparison for small values is probably a sanity-preserver during the algorithm optimization effort.

This question provides some good insight into optimization. The code is in c# but the algorithms are applicable. Watch out for spoilers, though.
Project Euler #15

It can be solved by simple observation of the pattern for small grids, and determining a straightforward formula for larger grids. There are over 100 billion paths for a 20x20 grid and any iterative solution will take too long to compute.

Here's my solution:
memo = {(0, 1) : 1, (1, 0) : 1}
def get_pathways(x, y):
if (x, y) in memo : return memo[(x, y)]
pathways = 0
if 0 in (x, y):
pathways = 1
else:
pathways = get_pathways(x-1, y) + get_pathways(x, y-1)
memo[(x, y)] = pathways
return pathways
enjoy :)

Optimizing python for loops

Here are two programs that naively calculate the number of prime numbers <= n.
One is in Python and the other is in Java.
public class prime{
public static void main(String args[]){
int n = Integer.parseInt(args[0]);
int nps = 0;
boolean isp;
for(int i = 1; i <= n; i++){
isp = true;
for(int k = 2; k < i; k++){
if( (i*1.0 / k) == (i/k) ) isp = false;
}
if(isp){nps++;}
}
System.out.println(nps);
}
}
`#!/usr/bin/python`
import sys
n = int(sys.argv[1])
nps = 0
for i in range(1,n+1):
isp = True
for k in range(2,i):
if( (i*1.0 / k) == (i/k) ): isp = False
if isp == True: nps = nps + 1
print nps
Running them on n=10000 I get the following timings.
shell:~$ time python prime.py 10000 && time java prime 10000
1230
real 0m49.833s
user 0m49.815s
sys 0m0.012s
1230
real 0m1.491s
user 0m1.468s
sys 0m0.016s
Am I using for loops in python in an incorrect manner here or is python actually just this much slower?
I'm not looking for an answer that is specifically crafted for calculating primes but rather I am wondering if python code is typically utilized in a smarter fashion.
The Java code was compiled with
javac 1.6.0_20
Run with java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8.1) (6b18-1.8.1-0ubuntu1~9.10.1)
OpenJDK Client VM (build 16.0-b13, mixed mode, sharing)
Python is:
Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15)

As has been pointed out, straight Python really isn't made for this sort of thing. That the prime checking algorithm is naive is also not the point. However, with two simple things I was able to greatly reduce the time in Python while using the original algorithm.
First, put everything inside of a function, call it main() or something. This decreased the time on my machine in Python from 20.6 seconds to 14.54 seconds. Doing things globally is slower than doing them in a function.
Second, use Psyco, a JIT compiler. This requires adding two lines to the top of the file (and of course having psyco installed):
import psyco
psyco.full()
This brought the final time to 2.77 seconds.
One last note. I decided for kicks to use Cython on this and got the time down to 0.8533. However, knowing how to make the few changes to make it fast Cython code isn't something that I recommend for the casual user.

Yes, Python is slow, about a hundred times slower than C. You can use xrange instead of range for a small speedup, but other than that it's fine.
Ultimately what you're doing wrong is that you do this in plain Python, instead of using optimized libraries such as Numpy or Psyco.
Java comes with a jit compiler that makes a big difference where you're just crunching numbers.

You can make your Python about twice as fast by replacing that complicated test with
if i % k == 0: isp = False
You can also make it about eight times faster (for n=10000) than that by adding a break after that isp = False.
Also, do yourself a favor and skip the even numbers (adding one to nps to start to include 2).
Finally, you only need k to go up to sqrt(i).
Of course, if you make the same changes in the Java, it's still about 10x faster than the optimized Python.

Boy, when you said it was a naive implementation, you sure weren't joking!
But yes, a one to two order of magnitude difference in performance is not unexpected when comparing JIT-compiled, optimized machine code with an interpreted language. An alternative Python implementation such as Jython, which runs on the Java VM, may well be faster for this task; you could give it a whirl. Cython, which allows you to add static typing to Python and get C-like performance in some cases, may be worth investigating as well.
Even when considering the standard Python interpreter, CPython, though, the question is: is Python fast enough for the task at hand? Will the time you save writing the code in a dynamic language like Python make up for the extra time spent running it? If you had to write a given program in Java, would it seem like too much work to be worth the trouble?
Consider, for example, that a Python program running on a modern computer will be about as fast as a Java program running on a 10-year-old computer. The computer you had ten years ago was fast enough for many things, wasn't it?
Python does have a number of features that make it great for numerical work. These include an integer type that supports an unlimited number of digits, a decimal type with unlimited precision, and an optional library called NumPy specifically for calculations. Speed of execution, however, is not generally one of its major claims to fame. Where it excels is in getting the computer to do what you want with minimal cognitive friction.

If you're looking to do it fast, Python probably isn't the way forward, but you could speed it up a bit. First, you're using quite a slow way to test for divisibility. Modulo is quicker. You can also stop the inner loop (with k) as soon as it detects a match. I'd do something like this:
nps = 0
for i in range(1, n+1):
if all(i % k for k in range(2, i)): # i.e. if divisible by none of them
nps += 1
That brings it down from 25 s to 1.5 s for me. Using xrange brings it down to 0.9 s.
You could speed it up further by keeping a list of primes you've already found, and only testing those, rather than every number up to i (if i isn't divisible by 2, it won't be divisible by 4, 6, 8...).

Why don't you post something about the memory usage - and not just the speed? Trying to get a simple servlet on tomcat is wasting 3GB on my server.
What you did with the examples up there is not very good. You need to use numpy. Replace for/range with while loops, thus avoiding the list creation.
At last, python is quite suitable for number crunching, at least by people that do it the right way, and know what Sieve of Eratosthenes is, or mod operation is.

There are lots of things you can do to this algorithm to speed it up, but most of them would also speed up the Java version as well. Some of those will speed up the Python more than the Java, so they're worth testing.
Here's just a couple of changes that speed it up from 11.4 to 2.8 seconds on my system:
nps = 0
for i in range(1,n+1):
isp = True
for k in range(2,i):
isp = isp and (i % k != 0)
if isp: nps = nps + 1
print nps

Python is a language which, ironically, is well-suited for developing algorithms. Even a modified algorithm like this:
# See Thomas K for use of all(), many posters for sqrt optimization
nps = 0
for i in xrange(1, n+1):
if all(i % k for k in xrange(2, 1 + int(i ** 0.5))):
nps += 1
runs in significantly under one second. Code like this:
def eras(n):
last = n + 1
sieve = [0,0] + range(2, last)
sqn = int(round(n ** 0.5))
it = (i for i in xrange(2, sqn + 1) if sieve[i])
for i in it:
sieve[i*i:last:i] = [0] * (n//i - i + 1)
return filter(None, sieve)
is faster still. Or try out these.
The thing is, python is usually fast enough for designing your solution. If it is not fast enough for production, use numpy or Jython to goose more performance out of it. Or move it to a compiled language, taking your algorithm observations learned in python with you.

Yes, Python is one of the slowest practical languages you'll encounter. While loops are marginally faster than for i in xrange(), but ultimately Python will always be much, much slower than anything else.
Python has its place: Prototyping theory and ideas, or in any situation where the ability to produce code fast is more important than the code's performance.
Python is a scripting language. Not a programming language.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.