I'm interested in programming languages that can reason about their own time complexity. To this end, it would be quite useful to have some way of representing time complexity programmatically, which would allow me to do things like:
f_time = O(n)
g_time = O(n^2)
h_time = O(sqrt(n))
fastest_asymptotically = min(f_time, g_time, h_time) # = h_time
total_time = f_time.inside(g_time).followed_by(h_time) # = O(n^3)
I'm using Python at the moment, but I'm not particularly tied to a language. I've experimented with sympy, but I haven't been able to find what I need out of the box there.
Is there a library that provides this capability? If not, is there a simple way to do the above with a symbolic math library?
EDIT: I wrote a simple library following #Patrick87's advice and it seems to work. I'm still interested if there are other solutions for this problem, though.
SymPy currently only supports the expansion at 0 (you can simulate other finite points by performing a shift). It doesn't support the expansion at infinity, which is what is used in algorithmic analysis.
But it would be a good base package for it, and if you implement it, we would gladly accept a patch (nb: I am a SymPy core developer).
Be aware that in general the problem is tricky, especially if you have two variables, or even symbolic constants. It's also tricky if you want to support oscilitory functions. EDIT: If you are interested in oscillating functions, this SymPy mailing list discussion gives some interesting papers.
EDIT 2: And I would recommend against trying to build this yourself from scratch, without the use of a computer algebra system. You will end up having to write your own computer algebra system, which is a lot of work, and even more work if you want to do it right and not have it be slow. There are already tons of systems already existing, including many that can act as libraries for code to be built on top of them (such as SymPy).
Actually you are building/finding a Expression Simplifier which can deal with:
+ (in your terms: followed_by)
***** (in your terms: inside)
^, log, ! (to represent the complexity)
variable (like n,m)
constant number (like that in 2^n)
For example, as you given f_time.inside(g_time).followed_by(h_time), It could be an expression like:
n*(n^2)+(n^(1/2))
, and you are expecting an processer to make it output as:n^3.
So in general speaking, you might want to use a common expression simplifier (if you want it to be interesting, go to check how Mathemetica does it) to get a simplified expression like n^3+n^(1/2), and then you need an additional processor to choose the term with highest complexity from the expression and get rid of the other terms. That would be easy, just use a table to define the complexity order of each kind of symbol.
Please note that in this case, the expressions are just symbol, you should write it as something like string (For your example: f_time = "O(n)"), not as functions.
If you're only working with big-O notation and are interested in whether one function grows more or less quickly than another, asymptotically speaking...
Given functions f and g
Compute the limit as n goes to infinity of f(n)/g(n) using a computer algebra package
If the limit diverges to +infinity, then f > g - in the sense that g = O(f), but f != O(g).
If the limit diverges to 0, then g < f.
If the limit converges to a finite number, then f = g (in the sense that f = O(g) and g = O(f))
If the limit is undefined... beats me!
Related
It is often said that compiled languages such as C perform better than interpreted languages such as Python. Therefore, I might be interested in migrating Python implementations to C/C++ (assuming they also have access to some Z3 API that is in use and maintenance).
However, this migration only makes sense in one case: if my performance loss is due to the language and not due to Z3. Therefore, I would like to know if there is any way to know what percentage of execution is being executed by Z3 and what percentage by pure Python.
A very naive possibility would be to use a timer just before and after each call to Z3 in my implementation and add up those times to finally see how much of the total those times represent. A sketch of this idea (pseudo-code):
timer_total = 0
time_z3 = 0
while executing:
time_local = take_time()
call_z3()
time_z3 += take_time() - time_local
print(time_z3/time_total)
This, even though it is an ugly solution, would answer my first question: how long does Z3 take over the total execution.
However, I would want to get even more information, if possible: I want to know not only how long Z3 takes to do its computations, but also whether using Python causes Z3 to have to do large data transformations before the information arrives "pure" (i.e., as if I wrote it in Z3) to Z3 and that, therefore, Z3's time has been considerably more than it would be if it didn't have to do them. In other words: I would like to know how long Z3 is only with the part of doing the logical calculations (not transformations and other processes), but only looking for models.
Specifically: I want to know if other languages like C++ do these transformations cheaper and therefore is the Z3 API of some other language more recommended/effective/optimized than Python.
I know it's abstract, but I hope the question was understood, and if not, we can discuss it in the comments.
Recently I have been revising some of my old python codes, which are essentially loops of algebra, in order to have them execute faster, generally by eliminating some un-necessary operations. Often, changing the value of an entry in a list from 0 (as a python float, which I believe is a double by default) to the same value, which is obviously not necessary. Or, checking if a float is equal to something, when it MUST be that thing, because a preceeding "if" would not have triggered if it wasn't, or some other extraneous operation. This got me wondering about what will preserve my battery more, as I do a some of my coding on the bus where I can't plug my laptop in.
For example, which of the following two operations would be expected to use less battery power?
if b != 0: #b was assigned previously, and I know it is zero already
b = 0
or:
b = 0
The first one checks if b is zero, and it is, so it doesn't do the next part. The second one just assigns b to zero without bothering to check. I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?
I suggest watching this talk by Chandler Carruth: "Efficiency with Algorithms, Performance with Data Structures"
He addresses the idea of "Power efficient instructions" at 4m 49s in the video. I agree with him, thinking about how much watt particular code consumes is useless. As he put it
Q: "How to save battery life?"
A: "Finish ruining the program".
Also, in Python you do not have low level control to be even thinking about low level problems like this. Use appropriate data structures and algorithms, and pray that Python interpreter will give you well optimized byte-code.
Does every simple mathematical operation use the same amount of power (as in, battery power)?
No. It's not the same to compute a two number addition than a fourier transform of a 20 megapixel photo.
I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?
Yes. You are right on your intuitions but these are very trivial examples. And if you dig deeper you will get into uncharted territory of weird optimization that's quite difficult to grasp (e.g., see this question: Times two faster than bit shift?)
In general the more your code utilizes system resources the greater power those resources would use. However it is more useful to optimize your code based on time or size constraints instead of thinking about high level code in terms of power draw.
One way of doing this is Big O notation. In essence, Big O notation is a way of comparing the size and or runtime complexity of an algorithm. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
A computer at its lowest level is large quantity of transistors which require power to change and maintain their state.
It would be extremely difficult to anticipate how much power any one line of python code would draw.
I once had questions like this. Still do sometimes. Here's the answer I wish someone told me earlier.
Summary
You are correct that generally, if your computer does less work, it'll use less power.
But we have to be really careful in figuring out which logical operations involve more work and which ones involve less work - in this case:
Reading vs writing memory is usually the same amount of work.
if and any other conditional execution also costs work.
Python's "simple operations" are not "simple operations" for the CPU.
But the idea you had is probably correct for some cases you had in mind.
If you're concerned about power consumption, measure where power is being used.
For some perspective: You're asking about which Python code costs you one more drop of water, but really in Python every operation costs a bucket and your whole Python program is using a river and your computer as a whole is using an ocean.
Direct Answers
Don't apply these answers to Python yet. Read the rest of the answer first, because there's so much indirection between Python and the CPU that you'll mislead yourself about how they're connected if you don't take that into account.
I believe the first one is more time-efficient, as you don't have to change anything in memory.
As a general rule, reading memory is just as slow as writing to memory, or even slower depending on exactly what your computer is doing. For further reading you'll want to look into CPU memory cache levels, memory access times, and how out-of-order execution and data dependencies factor into modern CPU architectures.
As a general rule, the if statement in a language is itself an operation which can have a non-negligible cost. For further reading you should look into how CPU pipelining relates to branch prediction and branch penalties. Also look into how if statements are implemented in typical CPU instruction sets.
Does "more time efficient" always imply "more power efficient"?
As a general rule, more work efficient (doing less work - less machine instructions, for example) implies more power efficient, because on modern hardware (this wasn't always this way) your hardware will use less power when it's not doing anything.
You should be careful about the idea of "more time efficient" though, because modern hardware doesn't always execute the same amount of work in the same amount of time: for further reading you'll want to look into CPU frequency scaling, ARM's big.LITTLE architectures, and discussions about the "Race to Idle" concept as a starting point.
"One Simple Operation" - CPU vs. Python
Your question is about Python, so it's important to realize that Python's x != 0, if, and x = 0 do not map directly to simple operations in the CPU.
For further reading, especially if you're familiar with C, I would recommend taking a long look at how Python is implemented. There are many implementations - the main one is CPython, which is a C program that reads and interprets Python source, converts it into Python "bytecode" and then when running interprets that bytecode one by one.
As a baseline, if you're using Python, any one "simple" operation is actually a lot of CPU operations, as each step in the Python interpreter is multiple CPU operations, but which ones cost more might be surprising.
Let's break down the three used in our example (I'm primarily describing this from the perspective of the main Python implementation written in C, called "CPython", which I am the most closely familiar with, but in general this explanation is roughly applicable to all of them, though some will be able to optimize out certain steps):
x != 0
It looks like a simple operation, and if this was C and x was an int it would be just one machine instruction - but Python allows for operator overloading, so first Python has to:
look up x (at least one memory read, but may involve one or more hashmap lookups in Python's internals, which is many machine operations),
check the type of x (more memory reading),
based on the type look up a function pointer that implements the not-equality operation (one or arbitrarily many memory reads and one or more arbitrarily many conditional branches, with data dependencies between them),
only then it can finally call that function with references to Python objects holding the values of x and 0 (which is also not "free" - look up function calling ABI for more on that).
All that and more has to be done by the CPU even if x is a Python int or float mapping closely to the CPU's native numerical data types.
x = 0
An assignment is actually far cheaper in Python (though still not trivial): it only has to get as far as step 1 above, because once it knows "where" x is, it can just overwrite that pointer with the pointer to the Python object representing 0.
if
Abstractly speaking, the Python if statement has to be able to handle "truthy" and "falsey" values, which in the most naive implementation would involves running through more CPU instructions to evaluate what result of the condition is according to Python's semantics of what's true and what's false.
Sidenote About Optimizations
Different Python implementations go to different lengths to get Python operations closer to as few CPU operations in possible. For example, an optimizing JIT (Just In Time) compiler might notice that, inside some loop on an array, all elements of the array are native integers and actually reduce the if x != 0 and x = 0 parts into their respective minimal machine instructions, but that only happens in very specific circumstances when the optimizing logic can prove that it can safely bypass a lot of the behavior it would normally need to do.
The biggest thing here is this: a high-level language like Python is so removed from the hardware that "simple" operations are often complex "under the covers".
What You Asked vs. What I Think You Wanted To Ask
Correct me if I'm wrong, but I suspect the use-case you actually had in mind was this:
if x != 0:
# some code
x = 0
vs. this:
if x != 0:
# some code
x = 0
In that case, the first option is superior to the second, because you are already paying the cost of if x != 0 anyway.
Last Point of Emphasis
The hardest breakthrough for me was moving away from trying to reason about individual instructions in my head, and instead switching into looking at how things work and measuring real systems.
Looking at how things work will teach you how to optimize, but measuring will show you where to optimize.
This question is great for exploring the former, but for your stated motivation of reducing power consumption on your laptop, you would benefit more from the latter.
I have a complex function which includes very (very) large numbers, and i optimize the function with scipy.minimize.
A long time ago when i implemented the function i used numpy.float128() numbers, because i thought it can handle big numbers better.
However i attended a course, and learned that python ints (and floats i guess) can be arbitrary large.
I changed my code to use simple integers, (changed the initialization from a = np.float128() to a = 0 ) and surprisingly the very same function has a different optimum if i use a = 0 and a = np.float128, If i run the minimization with f.e. a = np.float128() 10 times, i get the same results. I use SLSQP method for optimization with bounds.
The code is complex, and i think it is not required to answer my question, but in case needed i can provide it.
So how can this happen? Which type should i use? Is this some kind of a precision error?
I'm playing around with SymPy and it is very powerful. However, I would like to get it to 'slow down' and solve pieces of an equation at a time instead of most of the equation. For instance, given an input string equation (assuming the correct form) like
9x-((17-3)(4x)) - 8(34x)
I would like to first solve
9x-((14)(4x)) - 8(34x)
And then
9x-(56x) - 8(34x)
and then
9x-(56x) - 272x
And so on.
Another example,
from sympy import *
s = (30*(5*(5-10)-10*x))+10
s2 = expand(s, basic=False)
Gives me -300*x - 740 in one step, and I just want a single * done at a time
Reading the ideas document produced as a result of the Google Summer of Code, this appears to be something yet to be added to the library. As it stands there is no way of doing this for your example, without completely coding something yourself.
The issue of converting algorithms that are not equivalent to human workings, into discrete steps, is discussed and highlighted in the above document. I'm not sure if that'd be an issue in the implementation of expansion, but it's certainly an issue for other algorithms, which machines compute differently for reasons of efficiency.
tl;dr This library doesn't support step-by-step breakdowns for your example. Only the manualintegrate function currently has step-by-step workings.
I have systems of polynomials, fairly simple polynomial expressions but rather long
to optimize my hand. Expressions are grouped in sets, and in a given set there are common terms in several variables.
I would like to know if there is a computer algebra system, such as Mathematica, Matlab, or sympy, which can optimize multiple polynomials with common terms to minimize number of operations. It would be also great if such system can minimize the number of intermediate terms to reduce number of registers.
If such system is not existing, I am going to do my own, using Python symbolic algebra Sympy. If you are working on such package or are interested in developing or using one, please let me know.
here is a made-up example
x0 = ((t - q*A)*x + B)*y
y0 = ((t - q*A)*y + B)*z
z0 = ((t - q*A)*z + B)*x
so you can obviously factor the (t - qA) term. Now if you make number of terms very large with various combinations of common terms it becomes difficult to do by hand. The equations I have involve up to 40 terms and the size of set is around 20. Hope that helps
Thank you
Is sympy what you're looking for? I do believe it has support for polynomials although I don't know if it supports all the features you may desire (still, tweaking it to add what you think it might be missing has to be easier than writing your own from scratch;-).
Have you considered Maxima?
It is an impressive symbolical computation package that is free, open source, and has a strong and active community that provides valuable assistance when dealing with non-obvious formulations. It is readily availability for all three major operating systems, and has a precompiled Windows binary.
You have a variety of algebraic manipulation commands available for expressions and for systems of equations (such as yours): expand, factor, simplify, ratsimp, linsolve, etc.
This page (Maxima for Symbolic Computation)should get you started — downloading, installing, a few examples, and then pointing out additional resources to guide you on your way, including a quick command reference / cheat sheet, and some guidlines for writing your own scripts.
Well Mathematica can certainly do all sorts of transformations on sets of polynomial equations such as yours, and some of those transformations could be to reduce the number of terms. Whether that is the right answer for you is open to question, as you don't seem to have a copy available. I expect that the same is true for Maple and for most of the other CAS out there.
But your mention of
reduce number of registers
suggests that you are actually trying to do some data-flow analysis for compilation. You might want to look at the literature on that topic too. Some of that literature does indeed refer to computer-algebra-like transformations on expressions.
I'm late to the party, but anyway there is a function optimize in Maxima (https://maxima.sourceforge.io) which identifies common subexpressions and emits a blob of code which can be evaluated. For the example shown in the problem statement, I get:
(%i11) optimize ([x0 = ((t-A*q)*x+B)*y,
y0 = ((t-A*q)*y+B)*z,
z0 = x*((t-A*q)*z+B)]);
(%o11) block([%1],
%1 : t - A q,
[x0 = (%1 x + B) y,
y0 = (%1 y + B) z,
z0 = x (%1 z + B)])
As you can see, t - A*q was pulled out and assigned to a made-up variable %1 (percent sign being an allowed character for symbols in Maxima) which is then reused to compute other results.
? optimize at the input prompt shows some documentation about it.