Iterative Divide and Conquer algorithms - python

I am trying to create an algorithm using the divide-and-conquer approach but using an iterative algorithm (that is, no recursion).
I am confused as to how to approach the loops.
I need to break up my problems into smaller sub problems, until I hit a base case. I assume this is still true, but then I am not sure how I can (without recursion) use the smaller subproblems to solve the much bigger problem.
For example, I am trying to come up with an algorithm that will find the closest pair of points (in one-dimensional space - though I intend to generalize this on my own to higher dimensions). If I had a function closest_pair(L) where L is a list of integer co-ordinates in ℝ, how could I come up with a divide and conquer ITERATIVE algorithm that can solve this problem?
(Without loss of generality I am using Python)

The cheap way to turn any recursive algorithm into an iterative algorithm is to take the recursive function, put it in a loop, and use your own stack. This eliminates the function call overhead and from saving any unneeded data on the stack. However, this is not usually the "best" approach ("best" depends on the problem and context.)
They way you've worded your problem, it sounds like the idea is to break the list into sublists, find the closest pair in each, and then take the closest pair out of those two results. To do this iteratively, I think a better way to approach this than the generic way mentioned above is to start the other way around: look at lists of size 3 (there are three pairs to look at) and work your way up from there. Note that lists of size 2 are trivial.
Lastly, if your coordinates are integers, they are in Z (a much smaller subset of R).

Related

Simultaneous recursion on equivalent graphs using Python

I have a structure, looking a lot like a graph but I can 'sort' it. Therefore I can have two graphs, that are equivalent, but one is sorted and not the other. My goal is to compute a minimal dominant set (with a custom algorithm that fits my specific problem, so please do not link to other 'efficient' algorithms).
The thing is, I search for dominant sets of size one, then two, etc until I find one. If there isn't a dominant set of size i, using the sorted graph is a lot more efficient. If there is one, using the unsorted graph is much better.
I thought about using threads/multiprocessing, so that both graphs are explored at the same time and once one finds an answer (no solution or a specific solution), the other one stops and we go to the next step or end the algorithm. This didn't work, it just makes the process much slower (even though I would expect it to just double the time required for each step, compared to using the optimal graph without threads/multiprocessing).
I don't know why this didn't work and wonder if there is a better way, that maybe doesn't even required the use of threads/multiprocessing, any clue?
If you don't want an algorithm suggestion, then lazy evaluation seems like the way to go.
Setup the two in a data structure such that with a class_instance.next_step(work_to_do_this_step) where a class instance is a solver for one graph type. You'll need two of them. You can have each graph move one "step" (whatever you define a step to be) forward. By careful selection (possibly dynamically based on how things are going) of what a step is, you can efficiently alternate between how much work/time is being spent on the sorted vs unsorted graph approaches. Of course this is only useful if there is at least a chance that either algorithm may finish before the other.
In theory if you can independently define what those steps are, then you could split up the work to run them in parallel, but it's important that each process/thread is doing roughly the same amount of "work" so they all finish about the same time. Though writing parallel algorithms for these kinds of things can be a bit tricky.
Sounds like you're not doing what you describe. Possibly you're waiting for BOTH to finish somehow? Try doing that, and seeing if the time changes.

Time complexity of python "set.intersection" for n sets

I want to know the complexity of the set.intersection of python. I looked in the documentations and the online wikis for python, but I did not find the time complexity of this method for multiple sets.
The python wiki on time complexity lists a single intersection as O(min(len(s), len(t)) where s and t are the sizes of the sets and t is a set. (In English: the time is bounded by and linear in the size of the smaller set.)
Note: based on the comments below, this wiki entry had been be wrong if the argument passed is not a set. I've corrected the wiki entry.
If you have n sets (sets, not iterables), you'll do n-1 intersections and the time can be
(n-1)O(len(s)) where s is the set with the smallest size.
Note that as you do an intersection the result may get smaller, so although O is the worst case, in practice, the time will be better than this.
However, looking at the specific code this idea of taking the min() only applies to a single pair of sets and doesn't extend to multiple sets. So in this case, we have to be pessimistic and take s as the set with the largest size.

Which data structure is appropriate for this?

I have a line in my code that currently does this at each step x:
myList = [(lo,hi) for lo,hi in myList if lo <= x <= hi]
This is pretty slow. Is there a more efficient way to eliminate things from a list that don't contain a given x?
Perhaps you're looking for an interval tree. From Wikipedia:
In computer science, an interval tree is an ordered tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point.
So, instead of storing the (lo, hi) pairs sequentially in a list, you would have them define the intervals in an interval tree. Then you could perform queries on the tree with x, and retain only the intervals that overlap x.
While you don't give much context, I'll assume the rest of the loop looks like:
for x in xlist:
myList = [(lo,hi) for lo,hi in myList if lo <= x <= hi]
In this case, if may be more efficient to construct an interval tree (http://en.wikipedia.org/wiki/Interval_tree) first. Then, for each x you walk the tree and find all intervals which intersect with x; add these intervals to a set as you find them.
Here I'm going to suggest what may seem like a really dumb solution favoring micro-optimizations over algorithmic ones. It'll depend on your specific needs.
The ultimate question is this: is a single linear pass over your array (list in Python), on average, expensive? In other words, is searching for lo/high pairs that contain x going to generally yield results that are very small (ex: 1% of the overall size of the list), or relatively quite large (ex: 25% or more of the original list)?
If the answer is the latter, you might actually get a more efficient solution keeping a basic, contiguous, cache-friendly representation that you're accessing sequentially. The hardware cache excels at plowing through contiguous data where multiple adjacent elements fit into a cache line sequentially.
What you want to avoid in such a case is the expensive linear-time removal from the middle of the array as well as possibly the construction of a new one. If you trigger a linear-time operation for every single individual element you remove from the array, then naturally that's going to get very expensive very quickly.
To exchange that linear-time operation for a much faster constant-time one, all we have to do when we want to remove an element at a certain index in the array is to overwrite the element at that index with the element at the back of the array (last element). Now simply remove the redundant duplicate from the back of the array (a removal from the back of an array is a constant-time operation, and often involves just basic arithmetical instructions).
If your needs fit the criteria, then this can actually give you better results than a smarter algorithm. It's one of the peculiar cases where the practice can trump the theory due to the skewed performance of the hardware cache over DRAM, but if you're performing these types of hi/lo queries repeatedly and wanting to get very narrow results, then something smarter like an interval tree or at least sorting the data to allow binary searches can be considerably better.

How to calculate the algorithmic complexity of Python functions? [duplicate]

This question already has answers here:
Python Time Complexity (run-time)
(6 answers)
Closed 2 years ago.
When required to show how efficient the algorithm is, we need to show the algorithmic complexity of functions - Big O and so on. In Python code, how can we show or calculate the bounds of functions?
In general, there's no way to do this programmatically (you run into the halting problem).
If you have no idea where to start, you can gain some insight into how a function will perform by running some benchmarks (e.g. using the time module) with inputs of various sizes. You can even collect enough data to form a suspicion about what the runtime might be. But this won't give you a rigorous answer - for that, you need to prove mathematically that your suspected bound is in fact true.
For instance, if I'm playing with a sorting function and observe that the time is increasing roughly proportionally to the square of the input size, I might suspect that the complexity of this sort is O(n**2). But this does not constitute proof - in particular, some algorithms that perform well under typical inputs have pathological inputs that result in very poor performance.
To prove that the bound is in fact O(n**2), I need to look at what the algorithm is doing in the worst case - in this example, I might be analysing a selection sort, which repeatedly sweeps across the entire unsorted portion of the list and picks the lowest unsorted number. It should be evident that I'm examining something like n*(n-1) == O(n**2) elements. If examining elements is a constant-time operation, and placing the final element in the correct place is also not worse than O(n**2), then it follows that my entire algorithm is O(n**2).
If you're trying to get the big O notation for your own functions, you probably need variables keeping track of things like:
the runTime; the number of comparisons; the number of iterations; etc. As well as some calculation investigating how these correspond to the size of your data.
It's probably best to do this manually first, so you can check your understanding of an algorithm.

0/1 Knapsack with few variables: which algorithm?

I have to implement the solution to a 0/1 Knapsack problem with constraints.
My problem will have in most cases few variables (~ 10-20, at most 50).
I recall from university that there are a number of algorithms that in many cases perform better than brute force (I'm thinking, for example, to a branch and bound algorithm).
Since my problem is relative small, I'm wondering if there is an appreciable advantange in terms of efficiency when using a sophisticate solution as opposed to brute force.
If it helps, I'm programming in Python.
You can either use pseudopolynomial algorithm, which uses dynamic programming, if the sum of weights is small enough. You just calculate, whether you can get weight X with first Y items for each X and Y.
This runs in time O(NS), where N is number of items and S is sum of weights.
Another possibility is to use meet-in-the middle approach.
Partition items into two halves and:
For the first half take every possible combination of items (there are 2^(N/2) possible combinations in each half) and store its weight in some set.
For the second half take every possible combination of items and check whether there is a combination in first half with suitable weight.
This should run in O(2^(N/2)) time.
Brute force stuff would work fine for 10 variables, but for, say, 40 you'd get some 1000'000'000'000 possible solutions, which would probably take too long to enumerate. I'd consider approximate algorithms, e.g. the polynomial time algorithm (see, e.g. http://math.mit.edu/~goemans/18434S06/knapsack-katherine.pdf) or use a search algorithm such as branch-and-bound, maybe with an additional heuristic.
Brute force algorithms will always return the best solutions. The problem with them is that in exponential order problems they quickly become not feasible.
If you are guaranteed to have up to 20 variables, you will test no more than 1 million solutions (2^20= 1M). Hence, brute force is feasible and no other algorithm will return a better solution.
Heuristics are great, but they should be used only when we have no exact solution to the problem. There is a great book that might help you: How to Solve it, by Michalewicz.

Categories

Resources