ComputeBandStats extremely slow

ComputeBandStats extremely slow - python

When I try to use ComputeBandStats it takes extremely long time to finish. Is there any way to speed up the process?
Here is my code:
inIMG = gdal.Open(infile)
bandas = [inIMG.GetRasterBand(b+1) for b in range(3)]
print('hej1')
meanSD = [b.ComputeBandStats(1) for b in bandas]
print('hej2')
It prints out "hej1" pretty fast, but it only writes "hej2" after several hours. Therefore it seems that ComputeBandStats is the problem.
I tried it with no parameter (has worked at an earlier date) and with 1, but it doesn't seem to make any difference.
(I am using python 2.7 and gdal 1.11.3)

I found out that ComputeStatistics is much faster than ComputeBandStats, so I'm using it instead. I don't know exactly what the difference is, but aside from the speed advantage, ComputeStatistics also ignores no-data values, which turned out to be a problem for ComputeBandStats. It also calculates min, max, mean and std, which I all needed anyways.
This is the change I made:
inIMG = gdal.Open(infile)
bandas = [inIMG.GetRasterBand(b+1) for b in range(3)]
print('hej1')
stats = [b.ComputeStatistics(False) for b in bandas]
print('hej2')

Related

Scipy minimize iterating past bounds

I am trying to minimize a function of 3 input variables using scipy. The function reads like so-
def myfunc(x):
x[0] = a
x[1] = b
x[2] = c
n = f(a,b,c)
return n
bound1 = (80,100)
bound2 = (10,20)
bound3 = (312,740)
guess = [a0,b0,c0]
bds = (bound1,bound2,bound3)
result = minimize(myfunc, guess,method='L-BFGS-B',bounds=bds)
The function I am trying to currently run reaches a minimum at a=100,b=10,c=740, which is at the end of the bounds.
The minimize function keeps trying to iterate past the end of bound 3 (gets to c0 value of 740.0000000149012 on its last iteration.
Is there any way to stop this from happening? i.e. stop the iteration at the actual end of my bound?

This happens due to numerical-differentiation, which itself is not only needed to infer the step-direction and size, but also to reason about termination.
In general you can't do much without being very careful in regards to whatever solver (and there are many backend-solvers) being used. The basic idea is to replace the automatic numerical-differentiation with one provided by you: this one then respects those bounds and must be careful about the solvers-internals, e.g. "how to reason about termination at this end".
Fix A:
Your problem should vanish automatically when using: Pull-request #10673, which touches your configuration: L-BFGS-B.
It seems, this PR is not part of the current release SciPy 1.4.1 (as this was 2 months before the PR).
See also #6026, where a milestone of 1.5.0 is mentioned in regards to some changes including respecting bounds in num-diff.
For above PR, you will need to install scipy from the sources, which is:
quite doable on linux (and maybe os x)
not something you should try on windows!
trust me...
See the documentation if needed.
Fix B:
Apart from that, as you are doing unconstrained-optimization (with variable-bounds) where more solver-backends are available (compared to constrained-optimization), you might try another solver, trust-constr, which has explicit support for this, see #9098.
Be careful to recognize, that you need to signal this explicitly when setting up the bounds!

My python codes in general are very slow, is this normal?

I recently began self-learning python, and have been using this language for an online course in algorithms. For some reason, many of my codes I created for this course are very slow (relatively to C/C++ Matlab codes I have created in the past), and I'm starting to worry that I am not using python properly.
Here is a simple python and matlab code to compare their speed.
MATLAB
for i = 1:100000000
a = 1 + 1
end
Python
for i in list(range(0, 100000000)):
a=1 + 1
The matlab code takes about 0.3 second, and the python code takes about 7 seconds. Is this normal? My python codes for much complex problems are very slow. For example, as a HW assignment, I'm running depth first search on a graph with about 900000 nodes, and this is taking forever. Thank you.

Performance is not an explicit design goal of Python:
Don’t fret too much about performance--plan to optimize later when
needed.
That's one of the reasons why Python integrated with a lot of high performance calculating backend engines, such as numpy, OpenBLAS and even CUDA, just to name a few.
The best way to go foreward if you want to increase performance is to let high-performance libraries do the heavy lifting for you. Optimizing loops within Python (by using xrange instead of range in Python 2.7) won't get you very dramatic results.
Here is a bit of code that compares different approaches:
Your original list(range())
The suggestes use of xrange()
Leaving the i out
Using numpy to do the addition using numpy array's (vector addition)
Using CUDA to do vector addition on the GPU
Code:
import timeit
import matplotlib.pyplot as mplplt
iter = 100
testcode = [
"for i in list(range(1000000)): a = 1+1",
"for i in xrange(1000000): a = 1+1",
"for _ in xrange(1000000): a = 1+1",
"import numpy; one = numpy.ones(1000000); a = one+one",
"import pycuda.gpuarray as gpuarray; import pycuda.driver as cuda; import pycuda.autoinit; import numpy;" \
"one_gpu = gpuarray.GPUArray((1000000),numpy.int16); one_gpu.fill(1); a = (one_gpu+one_gpu).get()"
]
labels = ["list(range())", "i in xrange()", "_ in xrange()", "numpy", "numpy and CUDA"]
timings = [timeit.timeit(t, number=iter) for t in testcode]
print labels, timings
label_idx = range(len(labels))
mplplt.bar(label_idx, timings)
mplplt.xticks(label_idx, labels)
mplplt.ylabel('Execution time (sec)')
mplplt.title('Timing of integer addition in python 2.7\n(smaller value is better performance)')
mplplt.show()
Results (graph) ran on Python 2.7.13 on OSX:
The reason that Numpy performs faster than the CUDA solution is that the overhead of using CUDA does not beat the efficiency of Python+Numpy. For larger, floating point calculations, CUDA does even better than Numpy.
Note that the Numpy solution performs more that 80 times faster than your original solution. If your timings are correct, this would even be faster than Matlab...
A final note on DFS (Depth-afirst-Search): here is an interesting article on DFS in Python.

Try using xrange instead of range.
The difference between them is that **xrange** generates the values as you use them instead of range, which tries to generate a static list at runtime.

Unfortunately, python's amazing flexibility and ease comes at the cost of being slow. And also, for such large values of iteration, I suggest using itertools module as it has faster caching.
The xrange is a good solution however if you want to iterate over dictionaries and such, it's better to use itertools as in that, you can iterate over any type of sequence object.

Iterations in python consume whole memory - faulty code?

I have written some very simple trial-and-error code in Sage (a computer algebra system written in python where you can use regular python syntax in scripting). The little code snippet creates a polynomial and does some calculations with the coefficients, especially it determines the Groebner basis for the ideal generated by three expressions in the coefficients.
The problem is: This program goes on and eats up all my memory until it's killed by the kernel. Every iteration consumes only like 200kB, but this memory is never freed again.
Here is the code. The details are not that important and very bulky, therefore left out:
R = PolynomialRing(QQ, 2, 'bc', order='lex')
expr1, expr2, expr3 = ...
for i in range (0,50):
for j in range(i+1,50):
for k in range(j+1,50):
for l in range(k+1,50):
for m in range(l+1,50):
for n in range(m+1,50):
poly = (x-i)*(x-j)*(x-k)*(x-l)*(x-m)*(x-n)
r = poly.coeffs()
p1 = expr1.substitute(r...)
p2 = expr2.substitute(r...)
p3 = expr3.substitute(r...)
I = (p1, p2, p3)*R
B = I.groebner_basis()
As far as I understood python's memory management, the variables in the loop body are freed every so often. Now, it may be a programming problem, an internal python problem or some problem in the Sage routines. I don't know. Can you spot a problem with my code or is it something else?

The problem doesn't appear to be your loops (in python2.7, OS-X 10.5.8):
a = 0
for i in range (0,50):
for j in range(i+1,50):
for k in range(j+1,50):
for l in range(k+1,50):
for m in range(l+1,50):
for n in range(m+1,50):
a += 1
print( a )
Which takes very little additional memory on both python2.x and python3.x.
And it really doesn't take all that long to run either:
time python test.py
15890700
real 0m6.015s
user 0m5.940s
sys 0m0.032s
Perhaps something is funky when running with sage? Or maybe it's something else in your loops that is causing the problem...

The origin of the bug might be the call method on multivariate polynomials. Something as innocent as:
` for i in xrange(really_big_number):
polynomial(1,0,0,0)==0 `
will explode.
This might either happen in p1 = expr1.substitute(r...) or well inside the algorithm for the Groebner Basis.

Can Go really be that much faster than Python?

I think I may have implemented this incorrectly because the results do not make sense. I have a Go program that counts to 1000000000:
package main
import (
"fmt"
)
func main() {
for i := 0; i < 1000000000; i++ {}
fmt.Println("Done")
}
It finishes in less than a second. On the other hand I have a Python script:
x = 0
while x < 1000000000:
x+=1
print 'Done'
It finishes in a few minutes.
Why is the Go version so much faster? Are they both counting up to 1000000000 or am I missing something?

One billion is not a very big number. Any reasonably modern machine should be able to do this in a few seconds at most, if it's able to do the work with native types. I verified this by writing an equivalent C program, reading the assembly to make sure that it actually was doing addition, and timing it (it completes in about 1.8 seconds on my machine).
Python, however, doesn't have a concept of natively typed variables (or meaningful type annotations at all), so it has to do hundreds of times as much work in this case. In short, the answer to your headline question is "yes". Go really can be that much faster than Python, even without any kind of compiler trickery like optimizing away a side-effect-free loop.

pypy actually does an impressive job of speeding up this loop
def main():
x = 0
while x < 1000000000:
x+=1
if __name__ == "__main__":
s=time.time()
main()
print time.time() - s
$ python count.py
44.221405983
$ pypy count.py
1.03511095047
~97% speedup!
Clarification for 3 people who didn't "get it". The Python language itself isn't slow. The CPython implementation is a relatively straight forward way of running the code. Pypy is another implementation of the language that does many tricky (especiallt the JIT) things that can make enormous differences. Directly answering the question in the title - Go isn't "that much" faster than Python, Go is that much faster than CPython.
Having said that, the code samples aren't really doing the same thing. Python needs to instantiate 1000000000 of its int objects. Go is just incrementing one memory location.

This scenario will highly favor decent natively-compiled statically-typed languages. Natively compiled statically-typed languages are capable of emitting a very trivial loop of say, 4-6 CPU opcodes that utilizes simple check-condition for termination. This loop has effectively zero branch prediction misses and can be effectively thought of as performing an increment every CPU cycle (this isn't entirely true, but..)
Python implementations have to do significantly more work, primarily due to the dynamic typing. Python must make several different calls (internal and external) just to add two ints together. In Python it must call __add__ (it is effectively i = i.__add__(1), but this syntax will only work in Python 3.x), which in turn has to check the type of the value passed (to make sure it is an int), then it adds the integer values (extracting them from both of the objects), and then the new integer value is wrapped up again in a new object. Finally it re-assigns the new object to the local variable. That's significantly more work than a single opcode to increment, and doesn't even address the loop itself - by comparison, the Go/native version is likely only incrementing a register by side-effect.
Java will fair much better in a trivial benchmark like this and will likely be fairly close to Go; the JIT and static-typing of the counter variable can ensure this (it uses a special integer add JVM instruction). Once again, Python has no such advantage. Now, there are some implementations like PyPy/RPython, which run a static-typing phase and should fare much better than CPython here ..

You've got two things at work here. The first of which is that Go is compiled to machine code and run directly on the CPU while Python is compiled to bytecode run against a (particularly slow) VM.
The second, and more significant, thing impacting performance is that the semantics of the two programs are actually significantly different. The Go version makes a "box" called "x" that holds a number and increments that by 1 on each pass through the program. The Python version actually has to create a new "box" (int object) on each cycle (and, eventually, has to throw them away). We can demonstrate this by modifying your programs slightly:
package main
import (
"fmt"
)
func main() {
for i := 0; i < 10; i++ {
fmt.Printf("%d %p\n", i, &i)
}
}
...and:
x = 0;
while x < 10:
x += 1
print x, id(x)
This is because Go, due to it's C roots, takes a variable name to refer to a place, where Python takes variable names to refer to things. Since an integer is considered a unique, immutable entity in python, we must constantly make new ones. Python should be slower than Go but you've picked a worst-case scenario - in the Benchmarks Game, we see go being, on average, about 25x times faster (100x in the worst case).
You've probably read that, if your Python programs are too slow, you can speed them up by moving things into C. Fortunately, in this case, somebody's already done this for you. If you rewrite your empty loop to use xrange() like so:
for x in xrange(1000000000):
pass
print "Done."
...you'll see it run about twice as fast. If you find loop counters to actually be a major bottleneck in your program, it might be time to investigate a new way of solving the problem.

#troq
I'm a little late to the party but I'd say the answer is yes and no. As #gnibbler pointed out, CPython is slower in the simple implementation but pypy is jit compiled for much faster code when you need it.
If you're doing numeric processing with CPython most will do it with numpy resulting in fast operations on arrays and matrices. Recently I've been doing a lot with numba which allows you to add a simple wrapper to your code. For this one I just added #njit to a function incALot() which runs your code above.
On my machine CPython takes 61 seconds, but with the numba wrapper it takes 7.2 microseconds which will be similar to C and maybe faster than Go. Thats an 8 million times speedup.
So, in Python, if things with numbers seem a bit slow, there are tools to address it - and you still get Python's programmer productivity and the REPL.
def incALot(y):
x = 0
while x < y:
x += 1
#njit('i8(i8)')
def nbIncALot(y):
x = 0
while x < y:
x += 1
return x
size = 1000000000
start = time.time()
incALot(size)
t1 = time.time() - start
start = time.time()
x = nbIncALot(size)
t2 = time.time() - start
print('CPython3 takes %.3fs, Numba takes %.9fs' %(t1, t2))
print('Speedup is: %.1f' % (t1/t2))
print('Just Checking:', x)
CPython3 takes 58.958s, Numba takes 0.000007153s
Speedup is: 8242982.2
Just Checking: 1000000000

Problem is Python is interpreted, GO isn't so there's no real way to bench test speeds. Interpreted languages usually (not always have a vm component) that's where the problem lies, any test you run is being run in interpreted bounds not actual runtime bounds. Go is slightly slower than C in terms of speed and that is mostly due to it using garbage collection instead of manual memory management. That said GO compared to Python is fast because its a compiled language, the only thing lacking in GO is bug testing I stand corrected if I'm wrong.

It is possible that the compiler realized that you didn't use the "i" variable after the loop, so it optimized the final code by removing the loop.
Even if you used it afterwards, the compiler is probably smart enough to substitute the loop with
i = 1000000000;
Hope this helps =)

I'm not familiar with go, but I'd guess that go version ignores the loop since the body of the loop does nothing. On the other hand, in the python version, you are incrementing x in the body of the loop so it's probably actually executing the loop.

Numeric function for log of sum in Python

Given log(a) and log(b), I want to compute log(a+b) (in a numerically stable way).
I wrote a little function for this:
def log_add(logA,logB):
if logA == log(0):
return logB
if logA<logB:
return log_add(logB,logA)
return log( 1 + math.exp(logB-logA) ) + logA
I wrote a program where this is by far the most time-consuming piece of code. Obviously I could try to optimize it (eliminate the recursive call, for instance).
Do you know of a standard math or numpy function for computing log(a+b) from log(a) and log(b)?
If not, do you know of a simple way to make a single C++ hook for this function? It's not a complicated function (it uses floats), and as I said, it's taking up the majority of my runtime.
Thanks in advance, numerical methods ninja!

Note: Best answer until now is to simply use numpy.logaddexp(logA,logB).
Why exactly do you compare with log(0)? This is equal to -numpy.inf, in this case you come to log(1 + math.exp(-inf-logB) ) + logB Which reduces itself to logB. This call always will give an warning message which is extremely slow.
I could come up with this one-liner. However you'll need to really measure to see if this is actually faster. It does only use one 'complex' calculation function instead of the two that you use, and no recursion is happening, the if is still there but hidden (and maybe optimized) in fabs/maximum.
def log_add(logA,logB):
return numpy.logaddexp(0,-numpy.fabs(logB-logA)) + numpy.maximum(logA,logB)
edit:
I did a quick timeit() with following results :
Your original version took about 120s
My version took about 30s
I removed the compare with log(0) from your version and it came down to 20s
I edited my code to keep the logaddexp but also worked with your recursive if and it went down to 18s.
Updated code, you could also switch the recursive call with an inline updated formula but this made little difference in my timing tests:
def log_add2(logA, logB):
if logA < logB:
return log_add2(logB, logA)
return numpy.logaddexp(0,logB-logA)+logA
Edit 2:
As pv noted in comments, you could actually just do numpy.logaddexp(logA, logB) which comes down to calculating log(exp(logA)+exp(logB)) which is of course equal to log(A+B). I timed it (on the same machine as above) and it went further down to about 10s. So we've come down to about 1/12, not bad ;).

def log_add(logA, logB):
return math.log(math.exp(logA) + math.exp(logB))
is too slow? or inaccurate?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.