LFU cache implementation in python - python

I have implemented LFU cache in python with the help of Priority Queue Implementation given at
https://docs.python.org/2/library/heapq.html#priority-queue-implementation-notes
I have given code in the end of the post.
But I feel that code has some serious problems:
1. To give a scenario, suppose there is only one page is continuously getting visited (say 50 times). But this code will always mark the already added node as "removed" and add it to heap again. So basically it will have 50 different nodes for the same page. Hence increasing heap size enormously.
2. This question is almost similar to Q1 of Telephonic Interview of
http://www.geeksforgeeks.org/flipkart-interview-set-2-sde-2/
And the person mentioned that doubly linked list can give better efficiency as compared to heap. Can anyone explain me, how?
from llist import dllist
import sys
from heapq import heappush, heappop
class LFUCache:
heap = []
cache_map = {}
REMOVED = "<removed-task>"
def __init__(self, cache_size):
self.cache_size = cache_size
def get_page_content(self, page_no):
if self.cache_map.has_key(page_no):
self.update_frequency_of_page_in_cache(page_no)
else:
self.add_page_in_cache(page_no)
return self.cache_map[page_no][2]
def add_page_in_cache(self, page_no):
if (len(self.cache_map) == self.cache_size):
self.delete_page_from_cache()
heap_node = [1, page_no, "content of page " + str(page_no)]
heappush(self.heap, heap_node)
self.cache_map[page_no] = heap_node
def delete_page_from_cache(self):
while self.heap:
count, page_no, page_content = heappop(self.heap)
if page_content is not self.REMOVED:
del self.cache_map[page_no]
return
def update_frequency_of_page_in_cache(self, page_no):
heap_node = self.cache_map[page_no]
heap_node[2] = self.REMOVED
count = heap_node[0]
heap_node = [count+1, page_no, "content of page " + str(page_no)]
heappush(self.heap, heap_node)
self.cache_map[page_no] = heap_node
def main():
cache_size = int(raw_input("Enter cache size "))
cache = LFUCache(cache_size)
while 1:
page_no = int(raw_input("Enter page no needed "))
print cache.get_page_content(page_no)
print cache.heap, cache.cache_map, "\n"
if __name__ == "__main__":
main()

Efficiency is a tricky thing. In real-world applications, it's often a good idea to use the simplest and easiest algorithm, and only start to optimize when that's measurably slow. And then you optimize by doing profiling to figure out where the code is slow.
If you are using CPython, it gets especially tricky, as even an inefficient algorithm implemented in C can beat an efficient algorithm implemented in Python due to the large constant factors; e.g. a double-linked list implemented in Python tends to be a lot slower than simply using the normal Python list, even for cases where in theory it should be faster.
Simple algorithm:
For an LFU, the simplest algorithm is to use a dictionary that maps keys to (item, frequency) objects, and update the frequency on each access. This makes access very fast (O(1)), but pruning the cache is slower as you need to sort by frequency to cut off the least-used elements. For certain usage characteristics, this is actually faster than other "smarter" solutions, though.
You can optimize for this pattern by not simply pruning your LFU cache to the maximum length, but to prune it to, say, 50% of the maximum length when it grows too large. That means your prune operation is called infrequently, so it can be inefficient compared to the read operation.
Using a heap:
In (1), you used a heap because that's an efficient way of storing a priority queue. But you are not implementing a priority queue. The resulting algorithm is optimized for pruning, but not access: You can easily find the n smallest elements, but it's not quite as obvious how to update the priority of an existing element. In theory, you'd have to rebalance the heap after every access, which is highly inefficient.
To avoid that, you added a trick by keeping elements around even if they are deleted. But this trades in space for time.
If you don't want to trade in time, you could update the frequencies in-place, and simply rebalance the heap before pruning the cache. You regain fast access times at the expense of slower pruning time, like the simple algorithm above. (I doubt there is any speed difference between the two, but I have not measured this.)
Using a double-linked list:
The double-linked list mentioned in (2) takes advantage of the nature of the possible changes here: An element is either added as the lowest priority (0 accesses), or an existing element's priority is incremented exactly by 1. You can use these attributes to your advantage if you design your data structures like this:
You have a double-linked list of elements which is ordered by the frequency of the elements. In addition, you have a dictionary that maps items to elements within that list.
Accessing an element then means:
Either it's not in the dictionary, that is, it's a new item, in which case you can simply append it to the end of the double-linked list (O(1))
or it's in the dictionary, in which case you increment the frequency in the element and move it leftwards through the double-linked list until the list is ordered again (O(n) worst-case, but usually closer to O(1)).
To prune the cache, you simply cut off n elements from the end of the list (O(n)).

Related

out-of-core/external-memory combinatorics in python

I am iterating the search space of valid Python3 ASTs. With max recursion depth = 3, my laptop runs out of memory. My implementation makes heavy use of generators, specifically 'yield' and itertools.product().
Ideally, I'd replace product() and the max recursion depth with some sort of iterative deepening, but first things first:
Are there any libraries or useful SO posts for out-of-core/external-memory combinatorics?
If not... I am considering the feasibility of using either dask or joblib's delayed()... or perhaps wendelin-core's ZBigArray, though I don't like the looks of its interface:
root = dbopen('test.fs')
root['A'] = A = ZBigArray((10,), np.int)
transaction.commit()
Based on this example, I think that my solution would involve an annotation/wrapper function that eagerly converts the generators to ZBigArrays, replacing root['A'] with something like root[get_key(function_name, *function_args)] It's not pretty, since my generators are not entirely pure--the output is shuffled. In my current version, this shouldn't be a big deal, but the previous and next versions involve using various NNs and RL rather mere shuffling.
First things first- the reason you're getting the out of memory error is because itertools.product() caches intermediate values. It has no idea if the function that gave you your generator is idempotent, and even if it did, it couldn't be able to infer how to call it again given just the generator. This means itertools.product must cache values of each iterable its passed.
The solution here is to bite the small performance bullet and either write explicit for loops, or write your own cartesian product function, which takes functions that would produce each generator. For instance:
def product(*funcs, repeat=None):
if not funcs:
yield ()
return
if repeat is not None:
funcs *= repeat
func, *rest = funcs
for val in func():
for res in product(*rest):
yield (val, ) + res
from functools import partial
values = product(partial(gen1, arg1, arg2), partial(gen2, arg1))
The bonus from rolling your own here is that you can also change how it goes about traversing the A by B by C ... dimensional search space, so that you could do maybe a breadth-first search instead of an iteratively deepening DFS. Or, maybe pick some random space-filling curve, such as the Hilbert Curve which would iterate all indices/depths of each dimension in your product() in a local-centric fashion.
Apart from that, I have one more thing to point out- you can also implement BFS lazily (using generators) to avoid building a queue that could bloat memory usage as well. See this SO answer, copied below for convenience:
def breadth_first(self):
yield self
for c in self.breadth_first():
if not c.children:
return # stop the recursion as soon as we hit a leaf
yield from c.children
Overall, you will take a performance hit from using semi-coroutines, with zero caching, all in python-land (in comparison to the baked in and heavily optimized C of CPython). However, it should still be doable- algorithmic optimizations (avoiding generating semantically nonsensical ASTs, prioritizing ASTs that suit your goal, etc.) will have a larger impact than the constant-factor performance hit.

When CPython set `in` operator is O(n)?

I was reading about the time complexity of set operations in CPython and learned that the in operator for sets has the average time complexity of O(1) and worst case time complexity of O(n). I also learned that the worst case wouldn't occur in CPython unless the set's hash table's load factor is too high.
This made me wonder, when such a case would occur in the CPython implementation? Is there a simple demo code, which shows a set with clearly observable O(n) time complexity of the in operator?
Load factor is a red herring. In CPython sets (and dicts) automatically resize to keep the load factor under 2/3. There's nothing you can do in Python code to stop that.
O(N) behavior can occur when a great many elements have exactly the same hash code. Then they map to the same hash bucket, and set lookup degenerates to a slow form of linear search.
The easiest way to contrive such bad elements is to create a class with a horrible hash function. Like, e.g., and untested:
class C:
def __init__(self, val):
self.val = val
def __eq__(a, b):
return a.val == b.val
def __hash__(self):
return 3
Then hash(C(i)) == 3 regardless of the value of i.
To do the same with builtin types requires deep knowledge of their CPython implementation details. For example, here's a way to create an arbitrarily large number of distinct ints with the same hash code:
>>> import sys
>>> M = sys.hash_info.modulus
>>> set(hash(1 + i*M) for i in range(10000))
{1}
which shows that the ten thousand distinct ints created all have hash code 1.
You can view the set source here which can help: https://github.com/python/cpython/blob/723f71abf7ab0a7be394f9f7b2daa9ecdf6fb1eb/Objects/setobject.c#L429-L441
It's difficult to devise a specific example but the theory is fairly simple luckily :)
The set stores the keys using a hash of the value, as long as that hash is unique enough you'll end up with the O(1) performance as expected.
If for some weird reason all of your items have different data but the same hash, it collides and it will have to check all of them separately.
To illustrate, you can see the set as a dict like this:
import collection
your_set = collection.defaultdict(list)
def add(value):
your_set[hash(value)].append(value)
def contains(value):
# This is where your O(n) can occur, all values the same hash()
values = your_set.get(hash(value), [])
for v in values:
if v == value:
return True
return False
This a sometimes called the 'amortization' of a set or dictionary. It's shows up now and then as an interview question. As #TimPeters says resizing happens automagically at 2/3 capacity, so you'll only hit O(n) if you force the hash, yourself.
In computer science, amortized analysis is a method for analyzing a given algorithm's complexity, or how much of a resource, especially time or memory, it takes to execute. The motivation for amortized analysis is that looking at the worst-case run time per operation, rather than per algorithm, can be too pessimistic.
`/* GROWTH_RATE. Growth rate upon hitting maximum load.
* Currently set to used*3.
* This means that dicts double in size when growing without deletions,
* but have more head room when the number of deletions is on a par with the
* number of insertions. See also bpo-17563 and bpo-33205.
*
* GROWTH_RATE was set to used*4 up to version 3.2.
* GROWTH_RATE was set to used*2 in version 3.3.0
* GROWTH_RATE was set to used*2 + capacity/2 in 3.4.0-3.6.0.
*/
#define GROWTH_RATE(d) ((d)->ma_used*3)`
More to the efficiency point. Why 2/3 ? The Wikipedia article has a nice graph
https://upload.wikimedia.org/wikipedia/commons/1/1c/Hash_table_average_insertion_time.png
accompanying the article . (linear probing curve corresponds to O(1) to O(n) for our purposes, chaining is a more complicated hashing approach)
See https://en.wikipedia.org/wiki/Hash_table
for the complete
Say you have a set or dictionary which is stable, and is at 2/3 - 1 of it underlying capacity. Do you really want sluggish performance forever? You may wish to force resizing it upwards.
"if the keys are always known in advance, you can store them in a set and build your dictionaries from the set using dict.fromkeys()." plus some other useful if dated observations. Improving performance of very large dictionary in Python
For a good read on dictresize(): (dict was in Python before set)
https://github.com/python/cpython/blob/master/Objects/dictobject.c#L415

Optimizing element searching in a python Heap

I'm looking for an object in a python Heap. Technically, looking for the absence of it, but I assume the logic works similarly.
heap = []
heapq.heappush(heap, (10, object))
if object not in [k for v, k in heap]:
## code goes here ##
However, this check is the longest (most processor-intensive) part of my program at large numbers of elements in the heap.
Can this search be optimized? And if so, how?
heapq is a binary heap implementation of a priority queue. A binary heap makes a pretty efficient priority queue but, as you've discovered, finding an item requires a sequential search.
If all you need to know is whether an item is in the queue, then your best bet is probably to maintain a dictionary along with the queue. So when you add something to the queue, your code is something like:
"""
I'm not really a python guy, so the code probably has syntax errors.
But I think you get the idea.
"""
theQueue = [];
queueIndex = {};
queueInsert(item)
if (item.key in queueIndex)
// item already in queue. Exit.
heapq.heappush(theQueue, item);
queueIndex[item.key] = 1
queuePop()
result = heapq.heappop();
del queueIndex[result.key];
return result;
Note that if the item you're putting on the heap is a primitive like a number or a string, then you'd replace item.key with item.
Also, note that this will not work correctly if you can place duplicates in the queue. You could modify it to allow that, though. You'd just need to maintain the count of items so that you don't remove from the index until the count goes to 0.
You can't do it with heapq, but here's a compatible implementation that works as long as the heap will not contain multiple copies of the same element.
https://github.com/elplatt/python-priorityq

Complexity does not match actual growth in running time? (python)

I ran 2 codes in python then measured the time it took to complete.The codes are quite simple , just recursive maximums. Here it is:
1.
def max22(L, left, right):
if(left>=right):
return L[int(left)]
k = max22(L,left,(left+right-1)//2)
p = max22(L, (right+left+1)//2,right)
return max(k,p)
def max_list22(L):
return max22(L,0,len(L)-1)
def max2(L):
if len(L)==1:
return L[0]
l = max2(L[:len(L)//2])
r = max2(L[len(L)//2:])
return max(l,r)
The first one should run (imo) in O(logn), and the second one in O(n*logn).
However, I measured the running time for n=1000 , n=2000 and n=4000,
And somehow the growth for both of the algorithms seems to be linear! How is this possible? Did I get the complexity wrong, or is it okay?
Thanks.
The first algorithms is not O(log n) because it checks value of each element. It may be shown that it is O(n)
As for the second, possibly you just couldn't notice difference between n and nlogn on such small scales.
Just because a function is splitting the search space by 2 and then recursively looking at each half does not mean that it has a log(n) factor in the complexity.
In your first solution, you are splitting the search space by 2, but then ultimately inspecting every element in each half. Unlike binary search which discards one half of the search space, you are inspecting both halves. This means nothing is discarded from the search and you ultimately end up looking at every element, making your complexity O(n). The same holds true for your second implementation.
Your first algorithm is O(n) on a normal machine, so it is not surprising that your testing indicated this. Your second algorithm is O(n*log in), but it would be O(n) if you were using proper arrays instead of lists. Since Python builtin list operations are pretty fast, you may not have hit the logarithmic slowdown yet; try it with values more like n=4000000 and see what you get.
Note that, if you could run both recursive calls in parallel (with O(1) slicing), both algorithms could run in O(log n) time. Of course, you would need O(n) processors to do this, but if you were designing a chip, instead of writing a program, that kind of scaling would be straightforward...

Python memory explosion with embedded functions

I have used Python for a while and from time to time I meet some memory explosion problem. I have searched for some sources to resolve my question such as
Memory profiling embedded python
and
https://mflerackers.wordpress.com/2012/04/12/fixing-and-avoiding-memory-leaks-in-python/
and
https://docs.python.org/2/reference/datamodel.html#object.del However, none of them works for me.
My current problem is the memory explosion when using embedded functions. The following codes works fine:
class A:
def fa:
some operations
get dictionary1
combine dictionary1 to get string1
dictionary1 = None
return *string1*
def fb:
for i in range(0, j):
call self.fa
get dictionary2 by processing *string1*
# dictionary1 and dictionary2 are basically the same.
update *dictionary3* by processing dictionary2
dictionary2 = None
return *dictionary3*
class B:
def ga:
for n in range(0, m):
call A.fb # as one argument is updated dynamically, I have to call it within the loop
processes *dictoinary3*
return something
The problem arouses when I notice that I don't need to combine dictionary1 to string1, I can directly pass dictionary1 to A.fb. I implemented it this way, then the program becomes extremely slow and the memory usage explodes for more than 10 times. I have verified that both method returned correct result.
May anybody suggest why such a little modification will result in so large difference?
Previously, I also noticed this when I was levelizing nodes in a multi-source tree (with 100,000+ nodes). If I start levelizing from the source node (which may have largest height) the memory usage is 100 times worse than that from the source node which may have smallest height. While the levelization time is about the same.
This has baffled me for a long time. Thank you so much in advance!
If anybody interested, I can email you the source code for a more clear explanation.
The fact that you're solving the same problem shouldn't imply anything about the efficiency of the solution. Same issue can be claimed with sorting arrays: you can use bubble-sort O(n^2), or merge-sort O(nlogn), or, if you can apply some restrictions you can use a non-comparison sorting algorithm like radix or bucket-sort which have linear runtime.
Starting to traverse from different nodes will generate different ways of traversing the graph - some of which might be non-efficient (repeating nodes more times).
As for "combine dictionary1 to string1" - it might be a very expensive operation and since this function is called recursively (many times) - the performance could be significantly poorer. But that's just an educated guess and cannot be answered without having more details about the complexity of the operations performed in these functions.

Categories

Resources