This question already has answers here:
Python Time Complexity (run-time)
(6 answers)
Closed 2 years ago.
When required to show how efficient the algorithm is, we need to show the algorithmic complexity of functions - Big O and so on. In Python code, how can we show or calculate the bounds of functions?
In general, there's no way to do this programmatically (you run into the halting problem).
If you have no idea where to start, you can gain some insight into how a function will perform by running some benchmarks (e.g. using the time module) with inputs of various sizes. You can even collect enough data to form a suspicion about what the runtime might be. But this won't give you a rigorous answer - for that, you need to prove mathematically that your suspected bound is in fact true.
For instance, if I'm playing with a sorting function and observe that the time is increasing roughly proportionally to the square of the input size, I might suspect that the complexity of this sort is O(n**2). But this does not constitute proof - in particular, some algorithms that perform well under typical inputs have pathological inputs that result in very poor performance.
To prove that the bound is in fact O(n**2), I need to look at what the algorithm is doing in the worst case - in this example, I might be analysing a selection sort, which repeatedly sweeps across the entire unsorted portion of the list and picks the lowest unsorted number. It should be evident that I'm examining something like n*(n-1) == O(n**2) elements. If examining elements is a constant-time operation, and placing the final element in the correct place is also not worse than O(n**2), then it follows that my entire algorithm is O(n**2).
If you're trying to get the big O notation for your own functions, you probably need variables keeping track of things like:
the runTime; the number of comparisons; the number of iterations; etc. As well as some calculation investigating how these correspond to the size of your data.
It's probably best to do this manually first, so you can check your understanding of an algorithm.
Related
For the sake of the argument, consider following (very bad) sorting algorithm in python:
def so(ar):
while True:
le = len(ar)
switch = False
for y in range(le):
if y+1 == le:
break
if ar[y] > ar[y+1]:
ar[y],ar[y+1] = ar[y+1],ar[y]
switch = True
if switch == False:
break
return ar
I'm trying to understand the concept of "complexity of the algorithm" and there is one thing I don't get.
I came across the post that explains how to find the complexity of the algorithm here:
You add up how many machine instructions it will execute as a function
of the size of its input, and then simplify the expression to the
largest (when N is very large) term and can include any simplifying
constant factor.
But well, the problem is, that I cannot calculate how many machine instructions will be executed just
by knowing the length of the list.
Consider first example:
li = [random.randint(1,5000) for x in range(3000)]
start = time.time()
so(li)
end = time.time() - start
print(end)
Output: 2.96921706199646
Now have a look at the second example:
ok = [5000,43000,232] + [x for x in range(2997)]
start = time.time()
so(ok)
end = time.time() - start
print(end)
Output: 0.010689020156860352
We can see that the same sorting algorithm, two different lists, lists are the same length, and two completely different execution times.
When people are talking about algorithm complexity (big O notation) they normally assume that the only variable that determines complexity of the algo is the size of the input, but clearly, in the example above it is not the case. It is not only the size of the list, but also the positioning of each value within the list that determines the speed of the sorting.
So my question is, why do we only consider size of input when estimating complexity?
And, if it is possible, can you tell me what the complexity of the algorithm above will be?
You're correct, complexity doesn't only depend on N. That's why you'll often see indications about average, worst and best cases.
Timsort is used in Python because it's (O n log n) on average, still fast for worst-cases (O(n log n)) and extremely fast for best-cases (O(n), when the list is already sorted).
Quicksort also has an average complexity of O(n log n), but its worst case is O(n²), when the list is already sorted. This use case happens very often, so it might be worth it to actually shuffle the list before sorting it!
why do we only consider size of input when estimating complexity?
In the narrow sense of complexity as of the use of Big O notation in computer science, it is simply by definition:
In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows.
In the broader sense your question could be interpreted as "why do we use Big O notation to describe algorithm complexity when the nature of the data can be just as important as its size."
The answer here lies in the fact that algorithm development is often done on small datasets to make it easy, while in the real world the datasets are huge. When you are writing your sorting function you're most likely going to try it first on small lists of random data. You'd want the result small enough that you can verify that it worked by simply looking at the result...
The time complexity is not always definitely dependent on size of input. When we look at randomized sorting algorithms, the input patterns might play a significant role in determining time complexity.
We usually calculate time complexity in terms of worst, good and average case and could particularly study time complexity in terms of specific input order/patterns which could lead to good, average and best case time complexity.
For example, in first case provided by you, since input is randomized, there is 1/n! probability for a particular input to occur. The good case (when the list is sorted already) is Ω(n) and the worst case(when the list is reversely sorted) is O(n²) , but the probability is low for best or worst case to occur.
Therefore, the sorting algorithm has θ(n²) average time complexity since the probability of comparison and swap in case of two elements in average case input is high due to random distribution of numbers.
In the second case, the order is strict which means high probability for input to tend toward best case or worst case time complexity . In your case, input is more tending towards good case, therefore lesser time.
Recently I have been revising some of my old python codes, which are essentially loops of algebra, in order to have them execute faster, generally by eliminating some un-necessary operations. Often, changing the value of an entry in a list from 0 (as a python float, which I believe is a double by default) to the same value, which is obviously not necessary. Or, checking if a float is equal to something, when it MUST be that thing, because a preceeding "if" would not have triggered if it wasn't, or some other extraneous operation. This got me wondering about what will preserve my battery more, as I do a some of my coding on the bus where I can't plug my laptop in.
For example, which of the following two operations would be expected to use less battery power?
if b != 0: #b was assigned previously, and I know it is zero already
b = 0
or:
b = 0
The first one checks if b is zero, and it is, so it doesn't do the next part. The second one just assigns b to zero without bothering to check. I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?
I suggest watching this talk by Chandler Carruth: "Efficiency with Algorithms, Performance with Data Structures"
He addresses the idea of "Power efficient instructions" at 4m 49s in the video. I agree with him, thinking about how much watt particular code consumes is useless. As he put it
Q: "How to save battery life?"
A: "Finish ruining the program".
Also, in Python you do not have low level control to be even thinking about low level problems like this. Use appropriate data structures and algorithms, and pray that Python interpreter will give you well optimized byte-code.
Does every simple mathematical operation use the same amount of power (as in, battery power)?
No. It's not the same to compute a two number addition than a fourier transform of a 20 megapixel photo.
I believe the first one is more time-efficient, as you don't have to change anything in memory. Is that correct, and if so, would it also be more power-efficient? Does "more time efficient" always imply "more power efficient"?
Yes. You are right on your intuitions but these are very trivial examples. And if you dig deeper you will get into uncharted territory of weird optimization that's quite difficult to grasp (e.g., see this question: Times two faster than bit shift?)
In general the more your code utilizes system resources the greater power those resources would use. However it is more useful to optimize your code based on time or size constraints instead of thinking about high level code in terms of power draw.
One way of doing this is Big O notation. In essence, Big O notation is a way of comparing the size and or runtime complexity of an algorithm. https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
A computer at its lowest level is large quantity of transistors which require power to change and maintain their state.
It would be extremely difficult to anticipate how much power any one line of python code would draw.
I once had questions like this. Still do sometimes. Here's the answer I wish someone told me earlier.
Summary
You are correct that generally, if your computer does less work, it'll use less power.
But we have to be really careful in figuring out which logical operations involve more work and which ones involve less work - in this case:
Reading vs writing memory is usually the same amount of work.
if and any other conditional execution also costs work.
Python's "simple operations" are not "simple operations" for the CPU.
But the idea you had is probably correct for some cases you had in mind.
If you're concerned about power consumption, measure where power is being used.
For some perspective: You're asking about which Python code costs you one more drop of water, but really in Python every operation costs a bucket and your whole Python program is using a river and your computer as a whole is using an ocean.
Direct Answers
Don't apply these answers to Python yet. Read the rest of the answer first, because there's so much indirection between Python and the CPU that you'll mislead yourself about how they're connected if you don't take that into account.
I believe the first one is more time-efficient, as you don't have to change anything in memory.
As a general rule, reading memory is just as slow as writing to memory, or even slower depending on exactly what your computer is doing. For further reading you'll want to look into CPU memory cache levels, memory access times, and how out-of-order execution and data dependencies factor into modern CPU architectures.
As a general rule, the if statement in a language is itself an operation which can have a non-negligible cost. For further reading you should look into how CPU pipelining relates to branch prediction and branch penalties. Also look into how if statements are implemented in typical CPU instruction sets.
Does "more time efficient" always imply "more power efficient"?
As a general rule, more work efficient (doing less work - less machine instructions, for example) implies more power efficient, because on modern hardware (this wasn't always this way) your hardware will use less power when it's not doing anything.
You should be careful about the idea of "more time efficient" though, because modern hardware doesn't always execute the same amount of work in the same amount of time: for further reading you'll want to look into CPU frequency scaling, ARM's big.LITTLE architectures, and discussions about the "Race to Idle" concept as a starting point.
"One Simple Operation" - CPU vs. Python
Your question is about Python, so it's important to realize that Python's x != 0, if, and x = 0 do not map directly to simple operations in the CPU.
For further reading, especially if you're familiar with C, I would recommend taking a long look at how Python is implemented. There are many implementations - the main one is CPython, which is a C program that reads and interprets Python source, converts it into Python "bytecode" and then when running interprets that bytecode one by one.
As a baseline, if you're using Python, any one "simple" operation is actually a lot of CPU operations, as each step in the Python interpreter is multiple CPU operations, but which ones cost more might be surprising.
Let's break down the three used in our example (I'm primarily describing this from the perspective of the main Python implementation written in C, called "CPython", which I am the most closely familiar with, but in general this explanation is roughly applicable to all of them, though some will be able to optimize out certain steps):
x != 0
It looks like a simple operation, and if this was C and x was an int it would be just one machine instruction - but Python allows for operator overloading, so first Python has to:
look up x (at least one memory read, but may involve one or more hashmap lookups in Python's internals, which is many machine operations),
check the type of x (more memory reading),
based on the type look up a function pointer that implements the not-equality operation (one or arbitrarily many memory reads and one or more arbitrarily many conditional branches, with data dependencies between them),
only then it can finally call that function with references to Python objects holding the values of x and 0 (which is also not "free" - look up function calling ABI for more on that).
All that and more has to be done by the CPU even if x is a Python int or float mapping closely to the CPU's native numerical data types.
x = 0
An assignment is actually far cheaper in Python (though still not trivial): it only has to get as far as step 1 above, because once it knows "where" x is, it can just overwrite that pointer with the pointer to the Python object representing 0.
if
Abstractly speaking, the Python if statement has to be able to handle "truthy" and "falsey" values, which in the most naive implementation would involves running through more CPU instructions to evaluate what result of the condition is according to Python's semantics of what's true and what's false.
Sidenote About Optimizations
Different Python implementations go to different lengths to get Python operations closer to as few CPU operations in possible. For example, an optimizing JIT (Just In Time) compiler might notice that, inside some loop on an array, all elements of the array are native integers and actually reduce the if x != 0 and x = 0 parts into their respective minimal machine instructions, but that only happens in very specific circumstances when the optimizing logic can prove that it can safely bypass a lot of the behavior it would normally need to do.
The biggest thing here is this: a high-level language like Python is so removed from the hardware that "simple" operations are often complex "under the covers".
What You Asked vs. What I Think You Wanted To Ask
Correct me if I'm wrong, but I suspect the use-case you actually had in mind was this:
if x != 0:
# some code
x = 0
vs. this:
if x != 0:
# some code
x = 0
In that case, the first option is superior to the second, because you are already paying the cost of if x != 0 anyway.
Last Point of Emphasis
The hardest breakthrough for me was moving away from trying to reason about individual instructions in my head, and instead switching into looking at how things work and measuring real systems.
Looking at how things work will teach you how to optimize, but measuring will show you where to optimize.
This question is great for exploring the former, but for your stated motivation of reducing power consumption on your laptop, you would benefit more from the latter.
Each sample is an array of features (ints). I need to split my samples into two separate groups by figuring out what the best feature, and the best splitting value for that feature, is. By "best", I mean the split that gives me the greatest entropy difference between the pre-split set and the weighted average of the entropy values on the left and right sides. I need to try all (2^m−2)/2 possible ways to partition these items into two nonempty lists (where m is the number of distinct values (all samples with the same value for that feature are moved together as a group))
The following is extremely slow so I need a more reasonable/ faster way of doing this.
sorted_by_feature is a list of (feature_value, 0_or_1) tuples.
same_vals = {}
for ele in sorted_by_feature:
if ele[0] not in same_vals:
same_vals[ele[0]] = [ele]
else:
same_vals[ele[0]].append(ele)
l = same_vals.keys()
orderings = list(itertools.permutations(l))
for ordering in orderings:
list_tups = []
for dic_key in ordering:
list_tups += same_vals[dic_key]
left_1 = 0
left_0 = 0
right_1 = num_one
right_0 = num_zero
for index, tup in enumerate(list_tups):
#0's or #1's on the left +/- 1
calculate entropy on left/ right, calculate entropy drop, etc.
Trivial details (continuing the code above):
if index == len(sorted_by_feature) -1:
break
if tup[1] == 1:
left_1 += 1
right_1 -= 1
if tup[1] == 0:
left_0 += 1
right_0 -= 1
#only calculate entropy if values to left and right of split are different
if list_tups[index][0] != list_tups[index+1][0]:
tl;dr
You're asking for a miracle. No programming language can help you out of this one. Use better approaches than what you're considering doing!
Your Solution has Exponential Time Complexity
Let's assume a perfect algorithm: one that can give you a new partition in constant O(1) time. In other words, no matter what the input, a new partition can be generated in a guaranteed constant amount of time.
Let's in fact go one step further and assume that your algorithm is only CPU-bound and is operating under ideal conditions. Under ideal circumstances, a high-end CPU can process upwards of 100 billion instructions per second. Since this algorithm takes O(1) time, we'll say, oh, that every new partition is generated in a hundred billionth of a second. So far so good?
Now you want this to perform well. You say you want this to be able to handle an input of size m. You know that that means you need about pow(2,m) iterations of your algorithm - that's the number of partitions you need to generate, and since generating each algorithm takes a finite amount of time O(1), the total time is just pow(2,m) times O(1). Let's take a quick look at the numbers here:
m = 20 means your time taken is pow(2,20)*10^-11 seconds = 0.00001 seconds. Not bad.
m = 40 means your time taken is pow(2,40)10-11 seconds = 1 trillion/100 billion = 10 seconds. Also not bad, but note how small m = 40 is. In the vast panopticon of numbers, 40 is nothing. And remember we're assuming ideal conditions.
m = 100 means 10^41 seconds! What happened?
You're a victim of algorithmic theory. Simply put, a solution that has exponential time complexity - any solution that takes 2^m time to complete - cannot be sped up by better programming. Generating or producing pow(2,m) outputs is always going to take you the same proportion of time.
Note further that 100 billion instructions/sec is an ideal for high-end desktop computers - your CPU also has to worry about processes other than this program you're running, in which case kernel interrupts and context switches eat into processing time (especially when you're running a few thousand system processes, which you no doubt are). Your CPU also has to read and write from disk, which is I/O bound and takes a lot longer than you think. Interpreted languages like Python also eat into processing time since each line is dynamically converted to bytecode, forcing additional resources to be devoted to that. You can benchmark your code right now and I can pretty much guarantee your numbers will be way higher than the simplistic calculations I provide above. Even worse: storing 2^40 permutations requires 1000 GBs of memory. Do you have that much to spare? :)
Switching to a lower-level language, using generators, etc. is all a pointless affair: they're not the main bottleneck, which is simply the large and unreasonable time complexity of your brute force approach of generating all partitions.
What You Can Do Instead
Use a better algorithm. Generating pow(2,m) partitions and investigating all of them is an unrealistic ambition. You want, instead, to consider a dynamic programming approach. Instead of walking through the entire space of possible partitions, you want to only consider walking through a reduced space of optimal solutions only. That is what dynamic programming does for you. An example of it at work in a problem similar to this one: unique integer partitioning.
Dynamic programming problems approaches work best on problems that can be formulated as linearized directed acyclic graphs (Google it if not sure what I mean!).
If a dynamic approach is out, consider investing in parallel processing with a GPU instead. Your computer already has a GPU - it's what your system uses to render graphics - and GPUs are built to be able to perform large numbers of calculations in parallel. A parallel calculation is one in which different workers can do different parts of the same calculation at the same time - the net result can then be joined back together at the end. If you can figure out a way to break this into a series of parallel calculations - and I think there is good reason to suggest you can - there are good tools for GPU interfacing in Python.
Other Tips
Be very explicit on what you mean by best. If you can provide more information on what best means, we folks on Stack Overflow might be of more assistance and write such an algorithm for you.
Using a bare-metal compiled language might help reduce the amount of real time your solution takes in ordinary situations, but the difference in this case is going to be marginal. Compiled languages are useful when you have to do operations like searching through an array efficiently, since there's no instruction-compilation overhead at each iteration. They're not all that more useful when it comes to generating new partitions, because that's not something that removing the dynamic bytecode generation barrier actually affects.
A couple of minor improvements I can see:
Use try/catch instead of if not in to avoid double lookup of keys
if ele[0] not in same_vals:
same_vals[ele[0]] = [ele]
else:
same_vals[ele[0]].append(ele)
# Should be changed to
try:
same_vals[ele[0]].append(ele) # Most of the time this will work
catch KeyError:
same_vals[ele[0]] = [ele]
Dont explicitly convert a generator to a list if you dont have to. I dont immediately see any need for your casting to a list, which would slow things down
orderings = list(itertools.permutations(l))
for ordering in orderings:
# Should be changed to
for ordering in itertools.permutations(l):
But, like I said, these are only minor improvements. If you really needed this to be faster, consider using a different language.
So I just learned about sorting algorithm s bubble, merge, insertion, sort etc. they all seem to be very similar in their methods of sorting with what seems to me minimal changes in their approach. So why do they produce such different sorting times ie O(n^2) vs O(nlogn) as an example
The "similarity" (?!) that you see is completely illusory.
The elementary, O(N squared), approaches, repeat their workings over and over, without taking any advantage, for the "next step", of any work done on the "previous step". So the first step takes time proportional to N, the second one to N-1, and so on -- and the resulting sum of integers from 1 to N is proportional to N squared.
For example, in selection sort, you are looking each time for the smallest element in the I:N section, where I is at first 0, then 1, etc. This is (and must be) done by inspecting all those elements, because no care was previously taken to afford any lesser amount of work on subsequent passes by taking any advantage of previous ones. Once you've found that smallest element, you swap it with the I-th element, increment I, and continue. O(N squared) of course.
The advanced, O(N log N), approaches, are cleverly structured to take advantage in following steps of work done in previous steps. That difference, compared to the elementary approaches, is so pervasive and deep, that, if one cannot perceive it, that speaks chiefly about the acuity of one's perception, not about the approaches themselves:-).
For example, in merge sort, you logically split the array into two sections, 0 to half-length and half-length to length. Once each half is sorted (recursively by the same means, until the length gets short enough), the two halves are merged, which itself is a linear sub-step.
Since you're halving every time, you clearly need a number of steps proportional to log N, and, as each step is O(N), obviously you get the very desirable O(N log N) as a result.
Python's "timsort" is a "natural mergesort", i.e, a variant of mergesort tuned to take advantage of already-sorted (or reverse-sorted) parts of the array, which it recognizes rapidly and avoids spending any further work on. This doesn't change big-O because that's about worst-case time -- but expected time crashes much further down because in so many real-life cases some partial sortedness is present.
(Note that, going by the rigid definition of big-O, quicksort isn't quick at all -- it's worst-case proportional to N squared, when you just happen to pick a terrible pivot each and every time... expected-time wise it's fine, though nowhere as good as timsort, because in real life the situations where you repeatedly pick a disaster pivot are exceedingly rare... but, worst-case, they might happen!-).
timsort is so good as to blow away even very experienced programmers. I don't count because I'm a friend of the inventor, Tim Peters, and a Python fanatic, so my bias is obvious. But, consider...
...I remember a "tech talk" at Google where timsort was being presented. Sitting next to me in the front row was Josh Bloch, then also a Googler, and Java expert extraordinaire. Less than mid-way through the talk he couldn't resist any more - he opened his laptop and started hacking to see if it could possibly be as good as the excellent, sharp technical presentation seemed to show it would be.
As a result, timsort is now also the sorting algorithm in recent releases of the Java Virtual Machine (JVM), though only for user-defined objects (arrays of primitives are still sorted the old way, quickersort [*] I believe -- I don't know which Java peculiarities determined this "split" design choice, my Java-fu being rather weak:-).
[*] that's essentially quicksort plus some hacks for pivot choice to try and avoid the poison cases -- and it's also what Python used to use before Tim Peters gave this one immortal contribution out of the many important ones he's made over the decades.
The results are sometimes surprising to people with CS background (like Tim, I have the luck of having a far-ago academic background, not in CS, but in EE, which helps a lot:-). Say, for example, that you must maintain an ever-growing array that is always sorted at any point in time, as new incoming data points must get added to the array.
The classic approach would use bisection, O(log N), to find the proper insertion point for each new incoming data point -- but then, to put the new data in the right place, you need to shift what comes after it by one slot, that's O(N).
With timsort, you just append the new data point to the array, then sort the array -- that's O(N) for timsort in this case (as it's so awesome in exploiting the already-sorted nature of the first N-1 items!-).
You can think of timsort as pushing the "take advantage of work previously done" to a new extreme -- where not only work previously done by the algorithm itself, but also other influences by other aspects of real-life data processing (causing segments to be sorted in advance), are all exploited to the hilt.
Then we could move into bucket sort and radix sort, which change the plane of discourse -- which in traditional sorting limits one to being able to compare two items -- by exploiting the items' internal structure.
Or a similar example -- presented by Bentley in his immortal book "Programming Pearls" -- of needing to sort an array of several million unique positive integers, each constrained to be 24 bits long.
He solved it with an auxiliary array of 16M bits -- just 2M bytes after all -- initially all zeroes: one pass through the input array to set the corresponding bits in the auxiliary array, then one pass through the auxiliary array to form the required integers again where 1s are found -- and bang, O(N) [and very speedy:-)] sorting for this special but important case!-)
Which is more CPU intensive, to do an if(x==num): check, or to do a sum x+y?
Your question is somewhat incomplete because you are comparing two different operations. If you need to add two things together then testing x==y isn't going to get you anywhere. So presumably you want to compare
if y != 0:
sum += y
with
sum +=y
It's a lot more complex for interpreted languages like Python, but on the hardware a test for non-zero introduces a branch and that in itself can be expensive. But I wouldn't want to say which would be faster without timing.
Throw into the equation different performance characteristics of different architectures and you have another confounding factor.
As always, you are best to write your code in the most natural maintainable way first and then time it. If you feel you need to extract more performance use a profiler to find hot spots and then optimise.