is this the right way to delete object inside dict - python

i wrote a class inheriting from dict, i wrote a member method to remove objects.
class RoleCOList(dict):
def __init__(self):
dict.__init__(self)
def recyle(self):
'''
remove roles too long no access
'''
checkTime = time.time()-60*30
l = [k for k,v in self.items() if v.lastAccess>checkTime]
for x in l:
self.pop(x)
isn't it too inefficient? i used 2 list loops but i couldn't find other way

At the SciPy conference last year, I attended a talk where the speaker said that any() and all() are fast ways to do a task in a loop. It makes sense; a for loop rebinds the loop variable on each iteration, whereas any() and all() simply consume the value.
Clearly, you use any() when you want to run a function that always returns a false value such as None. That way, the whole loop will run to the end.
checkTime = time.time() - 60*30
# use any() as a fast way to run a loop
# The .__delitem__() method always returns `None`, so this runs the whole loop
lst = [k for k in self.keys() if self[k].lastAccess > checkTime]
any(self.__delitem__(k) for k in lst)

what about this?
_ = [self.pop(k) for k,v in self.items() if v.lastAccess>checkTime]

Since you don't need the list you generated, you could use generators and a snippet from this consume recipe. In particular, use collections.deque to run through a generator for you.
checkTime = time.time()-60*30
# Create a generator for all the values you will age off
age_off = (self.pop(k) for k in self.keys() if self[k].lastAccess>checkTime)
# Let deque handle iteration (in one shot, with little memory footprint)
collections.deque(age_off,maxlen=0)
Since the dictionary is changed during the iteration of age_off, use self.keys() which returns a list. (Using self.iteritems() will raise a RuntimeError.)

My (completly unreadable solution):
from operator import delitem
map(lambda k: delitem(self,k), filter(lambda k: self[k].lastAccess<checkTime, iter(self)))
but at least it should be quite time and memory efficient ;-)

If performance is an issue, and if you will have large volumes of data, you might want to look into using a Python front-end for a system like memcached or redis; those can handle expiring old data for you.
http://memcached.org/
http://pypi.python.org/pypi/python-memcached/
http://redis.io/
https://github.com/andymccurdy/redis-py

Related

What's the point of using [object instance].__self__?

I was checking the code of the toolz library's groupby function in Python and I found this:
def groupby(key, seq):
""" Group a collection by a key function
"""
if not callable(key):
key = getter(key)
d = collections.defaultdict(lambda: [].append)
for item in seq:
d[key(item)](item)
rv = {}
for k, v in d.items():
rv[k] = v.__self__
return rv
Is there any reason to use rv[k] = v.__self__ instead of rv[k] = v?
This is a somewhat confusing trick to save a small amount of time:
We are creating a defaultdict with a factory function that returns a bound append method of a new list instance with [].append. Then we can just do d[key(item)](item) instead of d[key(item)].append(item) like we would have if we create a defaultdict that contains lists. If we don't lookup append everytime, we gain a small amount of time.
But now the dict contains bound methods instead of the lists, so we have to get the original list instance back via __self__.
__self__ is an attribute described for instance methods that returns the original instance. You can verify that with this for example:
>>> a = []
>>> a.append.__self__ is a
True
This is a somewhat convoluted, but possibly more efficient approach to creating and using a defaultdict of lists.
First, remember that the default item is lambda: [].append. This means create a new list, and store a bound append method in the dictionary. This saves you a method bind on every further append to the same key, and the garbage collect that follows. For example, the following more standard approach is less efficient:
d = collections.defaultdict(list)
for item in seq:
d[key(item)].append(item)
The problem then becomes how to get the original lists back out of the dictionary, since the reference is not stored explicitly. Luckily, bound methods have a __self__ attribute which does just that. Here, [].append.__self__ is a reference to the original [].
As a side note, the last loop could be a comprehension:
return {k: v.__self__ for k, v in d.items()}

Python 3 map is returning a list of NoneType objects instead of the type I modified?

I'm working on getting a better grasp of Python 3 fundamentals, specifically objects and modifying them in the context of a list (for now).
I created a simple class called MyThing() that just has a number, letter, and instance method for incrementing the number. My goal with this program was to create a list of 3 "MyThings", and manipulate the list in various ways. To start, I iterated through the list (obj_list_1) and incremented each number using each object's instance method. Easy enough.
What I'm trying to figure out how to do is perform the same operation in one line using the map function and lambda expressions (obj_list_2).
#!/usr/bin/env py
import copy
class MyThing:
def __init__(self, letter='A', number=0):
self.number = number
self.letter = letter
def __repr__(self) -> str:
return("(letter={}, number={})".format(self.letter, self.number))
def incr_number(self, incr=0):
self.number += incr
# Test program to try different ways of manipulating lists
def main():
obj1 = MyThing('A', 1)
obj2 = MyThing('B', 2)
obj3 = MyThing('C', 3)
obj_list_1 = [obj1, obj2, obj3]
obj_list_2 = copy.deepcopy(obj_list_1)
# Show the original list
print("Original List: {}".format(obj_list_1))
# output: [(letter=A, number=1), (letter=B, number=2), (letter=C, number=3)]
# Standard iterating over a list and incrementing each object's number.
for obj in obj_list_1:
obj.incr_number(1)
print("For loop over List, adding one to each number:\n{}".format(obj_list_1))
# output: [(letter=A, number=2), (letter=B, number=3), (letter=C, number=4)]
# Try using map function with lambda
obj_list_2 = list(map(lambda x: x.incr_number(1), obj_list_2))
print("Using maps with incr_number instance method:\n{}".format(obj_list_2))
# actual output: [None, None, None] <--- If I don't re-assign obj_list_2...it shows the proper sequence
# expected output: [(letter=A, number=2), (letter=B, number=3), (letter=C, number=4)]
if __name__ == "__main__":
main()
What I can't figure out is how to get map() to return the correct type, a list of "MyThing"s.
I understand that between Python 2 and Python 3, map changed to return an iterable instead of a list, so I made sure to cast the output. What I get is a list of 'None' objects.
What I noticed, though, is that if I don't re-assign obj_list_2, and instead just call list(map(lambda x: x.incr_number(1), obj_list_2)), then print obj_list_2 in the next line, the numbers get updated as I expect.
However, if I don't cast the map iterable and just do map(lambda x: x.incr_number(1), obj_list_2), the following print statement shows the list as having not been updated. I read in some documentation that the map function is lazy and doesn't operate until it's use by something...so this makes sense.
Is there a way that I can get the output of list(map(lambda x: x.incr_number(1), obj_list_2)) to actually return my list of objects?
Are there any other cool one-liner solutions for updating a list of objects with their instance methods that I'm not thinking of?
TL;DR: Just use the for-loop. There's no advantage to using a map in this case.
Firstly:
You're getting a list of Nones because the mapped function returns None. That is, MyThing.incr_number() doesn't return anything, so it returns None implicitly.
Fewer lines is not necessarily better. Two simple lines are often easier to read than one complex line.
Notice that you're not creating a new list in the for-loop, you're only modifying the elements of the existing list.
list(map(lambda)) is longer and harder to read than a list comprehension:
[x.incr_number(1) for x in obj_list_2]
vs
list(map(lambda x: x.incr_number(1), obj_list_2))
Now, take a look at Is it Pythonic to use list comprehensions for just side effects? The top answer says no, it creates a list that never gets used. So there's your answer: just use the for-loop instead.
This is because, your incr_number doesn't return anything. Change it to:
def incr_number(self, incr=0):
self.number += incr
return self
The loop is clearly better, but here's another way anyway. Your incr_number doesn't return anything, or rather returns the default None. Which is a false value, so if you simply append or x, then you do get the modified value instead of the None
Change
list(map(lambda x: x.incr_number(1), obj_list_2))
to this:
list(map(lambda x: x.incr_number(1) or x, obj_list_2))

Is there a "Pythonic" way of creating a list with conditional items?

I've got this block of code in a real Django function. If certain conditions are met, items are added to the list.
ret = []
if self.taken():
ret.append('taken')
if self.suggested():
ret.append('suggested')
#.... many more conditions and appends...
return ret
It's very functional. You know what it does, and that's great...
But I've learned to appreciate the beauty of list and dict comprehensions.
Is there a more Pythonic way of phrasing this construct, perhaps that initialises and populates the array in one blow?
Create a mapping dictionary:
self.map_dict = {'taken': self.taken,
'suggested': self.suggested,
'foo' : self.bar}
[x for x in ['taken', 'suggested', 'foo'] if self.map_dict.get(x, lambda:False)()]
Related: Most efficient way of making an if-elif-elif-else statement when the else is done the most?
Not a big improvement, but I'll mention it:
def populate():
if self.taken():
yield 'taken'
if self.suggested():
yield 'suggested'
ret = list(populate())
Can we do better? I'm skeptical. Clearly there's a need of using another syntax than a list literal, because we no longer have the "1 expression = 1 element in result" invariant.
Edit:
There's a pattern to our data, and it's a list of (condition, value) pairs. We might try to exploit it using:
[value
for condition, value
in [(self.taken(), 'taken'),
(self.suggested(), 'suggested')]
if condition]
but this still is a restriction for how you describe your logic, still has the nasty side effect of evaluating all values no matter the condition (unless you throw in a ton of lambdas), and I can't really see it as an improvement over what we've started with.
For this very specific example, I could do:
return [x for x in ['taken', 'suggested', ...] if getattr(self, x)()]
But again, this only works where the item and method it calls to check have the same name, ie for my exact code. It could be adapted but it's a bit crusty. I'm very open to other solutions!
I don't know why we are appending strings that match the function names, but if this is a general pattern, we can use that. Functions have a __name__ attribute and I think it always contains what you want in the list.
So how about:
return [fn.__name__ for fn in (self.taken, self.suggested, foo, bar, baz) if fn()]
If I understand the problem correctly, this works just as well for non-member functions as for member functions.
EDIT:
Okay, let's add a mapping dictionary. And split out the function names into a tuple or list.
fns_to_check = (self.taken, self.suggested, foo, bar, baz)
# This holds only the exceptions; if a function isn't in here,
# we will use the .__name__ attribute.
fn_name_map = {foo:'alternate', bar:'other'}
def fn_name(fn):
"""Return name from exceptions map, or .__name__ if not in map"""
return fn_name_map.get(fn, fn.__name__)
return [fn_name(fn) for fn in fns_to_check if fn()]
You could also just use #hcwhsa's mapping dictionary answer. The main difference here is I'm suggesting just mapping the exceptions.
In another instance (where a value will be defined but might be None - a Django model's fields in my case), I've found that just adding them and filtering works:
return filter(None, [self.user, self.partner])
If either of those is None, They'll be removed from the list. It's a little more intensive than just checking but still fairly easy way of cleaning the output without writing a book.
One option is to have a "sentinel"-style object to take the place of list entries that fail the corresponding condition. Then a function can be defined to filter out the missing items:
# "sentinel indicating a list element that should be skipped
Skip = object()
def drop_missing(itr):
"""returns an iterator yielding all but Skip objects from the given itr"""
return filter(lambda v: v is not Skip, itr)
With this simple machinery, we come reasonably close to list-comprehension style syntax:
return drop_skips([
'taken' if self.taken else Skip,
'suggested' if self.suggested else Skip,
100 if self.full else Skip,
// many other values and conditions
])
ret = [
*('taken' for _i in range(1) if self.taken()),
*('suggested' for _i in range(1) if self.suggested()),
]
The idea is to use the list comprehension syntax to construct either a single element list with item 'taken', if self.taken() is True, or an empty list, if self.taken() is False, and then unpack it.

Python 3 changing value of dictionary key in for loop not working

I have python 3 code that is not working as expected:
def addFunc(x,y):
print (x+y)
def subABC(x,y,z):
print (x-y-z)
def doublePower(base,exp):
print(2*base**exp)
def RootFunc(inputDict):
for k,v in inputDict.items():
if v[0]==1:
d[k] = addFunc(*v[1:])
elif v[0] ==2:
d[k] = subABC(*v[1:])
elif v[0]==3:
d[k] = doublePower(*v[1:])
d={"s1_7":[1,5,2],"d1_6":[2,12,3,3],"e1_3200":[3,40,2],"s2_13":[1,6,7],"d2_30":[2,42,2,10]}
RootFunc(d)
#test to make sure key var assignment works
print(d)
I get:
{'d2_30': None, 's2_13': None, 's1_7': None, 'e1_3200': None, 'd1_6': None}
I expected:
{'d2_30': 30, 's2_13': 13, 's1_7': 7, 'e1_3200': 3200, 'd1_6': 6}
What's wrong?
Semi related: I know dictionaries are unordered but is there any reason why python picked this order? Does it run the keys through a randomizer?
print does not return a value. It returns None, so every time you call your functions, they're printing to standard output and returning None. Try changing all print statements to return like so:
def addFunc(x,y):
return x+y
This will give the value x+y back to whatever called the function.
Another problem with your code (unless you meant to do this) is that you define a dictionary d and then when you define your function, you are working on this dictionary d and not the dictionary that is 'input':
def RootFunc(inputDict):
for k,v in inputDict.items():
if v[0]==1:
d[k] = addFunc(*v[1:])
Are you planning to always change d and not the dictionary that you are iterating over, inputDict?
There may be other issues as well (accepting a variable number of arguments within your functions, for instance), but it's good to address one problem at a time.
Additional Notes on Functions:
Here's some sort-of pseudocode that attempts to convey how functions are often used:
def sample_function(some_data):
modified_data = []
for element in some_data:
do some processing
add processed crap to modified_data
return modified_data
Functions are considered 'black box', which means you structure them so that you can dump some data into them and they always do the same stuff and you can call them over and over again. They will either return values or yield values or update some value or attribute or something (the latter are called 'side effects'). For the moment, just pay attention to the return statement.
Another interesting thing is that functions have 'scope' which means that when I just defined it with a fake-name for the argument, I don't actually have to have a variable called "some_data". I can pass whatever I want to the function, but inside the function I can refer to the fake name and create other variables that really only matter within the context of the function.
Now, if we run my function above, it will go ahead and process the data:
sample_function(my_data_set)
But this is often kind of pointless because the function is supposed to return something and I didn't do anything with what it returned. What I should do is assign the value of the function and its arguments to some container so I can keep the processed information.
my_modified_data = sample_function(my_data_set)
This is a really common way to use functions and you'll probably see it again.
One Simple Way to Approach Your Problem:
Taking all this into consideration, here is one way to solve your problem that comes from a really common programming paradigm:
def RootFunc(inputDict):
temp_dict = {}
for k,v in inputDict.items():
if v[0]==1:
temp_dict[k] = addFunc(*v[1:])
elif v[0] ==2:
temp_dict[k] = subABC(*v[1:])
elif v[0]==3:
temp_dict[k] = doublePower(*v[1:])
return temp_dict
inputDict={"s1_7":[1,5,2],"d1_6":[2,12,3,3],"e1_3200":[3,40,2],"s2_13":[1,6,7],"d2_30"[2,42,2,10]}
final_dict = RootFunc(inputDict)
As erewok stated, you are using "print" and not "return" which may be the source of your error. And as far as the ordering is concerned, you already know that dictionaries are unordered, according to python doc at least, the ordering is not random, but rather implemented as hash tables.
Excerpt from the python doc: [...]A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary. [...]
Now key here is that the order of the element is not really random. I have often noticed that the order stays the same no matter how I construct a dictionary on some values... using lambda or just creating it outright, the order has always remained the same, so it can't be random, but it's definitely arbitrary.

Python functional evaluation efficiency

If I do this:
x=[(t,some_very_complex_computation(y)) for t in z]
Apparently some_very_complex_computation(y) is not dependent on t. So it should be evaluated only once. Is there any way to make Python aware of this, so it won't evaluate some_very_complex_computation(y) for every iteration?
Edit: I really want to do that in one line...
Usually you should follow San4ez's advise and just use a temporary variable here. I will still present a few techniques that might prove useful under certain circumstances:
In general, if you want to bind a name just for a sub-expression (which is usually why you need a temporary variable), you can use a lambda:
x = (lambda result=some_very_complex_computation(y): [(t, result) for t in z])()
In this particular case, the following is a quite clean and readable solution:
x = zip(z, itertools.repeat(some_very_complex_computation(y)))
A general note about automatic optimizations like these
In a dynamic language like Python, an implementation would have a very hard time to figure out that some_very_complex_computation is referentially transparent, that is, that it will always return the same result for the same arguments. You might want to look into a functional language like Haskell if you want magic like that.
"Explicit" pureness: Memoization
What you can do however is make some_very_complex_computation explicitly cache its return values for recent arguments:
from functools import lru_cache
#lru_cache()
def some_very_complex_computation(y):
# ...
This is Python 3. In Python 2, you'd have to write the decorator yourself:
from functools import wraps
def memoize(f):
cache = {}
#wraps(f)
def memoized(*args):
if args in cache:
return cache[args]
res = cache[args] = f(*args)
return res
return memoized
#memoize
some_very_complex_computation(x):
# ...
No, you should save value in variable
result = some_very_complex_computation(y)
x = [(t, result) for t in z]
I understand the sometimes perverse urge to get everything into one line, but at the same time it is good to keep things readable. You may consider this more readable than the lambda version:
x=[(t,s) for s in [some_very_complex_calculation(y)] for t in z]
However, you are probably better going for the answer by San4ez as being simple, readable (and possibly faster than creating and iterating through a one element list).
You can either:
Move the call out of the list comprehension
or
Use memoization (i.e. when some_very_complex_computation(y) gets called store the result in a dictionary, and if it gets called again with the same value just return the value stored in the dict
TL;DR version
zip(z, [long_computation(y)] * len(z))
Original answer:
As a rule of thumb, if you have some computation with a long execution time, it would be a good idea to cache the result directly in the function like this:
_cached_results = {}
def computation(v):
if v in _cached_results:
return _cached_results[v]
# otherwise do the computation here...
_cached_results[v] = result
return result
This would solve your problem too.
On one-liners
Doing one-liners for the sake of them is poor coding, yet... if you really wanted to do it in one line:
>>> def func(v):
... print 'executing func'
... return v * 2
...
>>> z = [1, 2, 3]
>>> zip(z, [func(10)] * len(z))
executing func
[(1, 20), (2, 20), (3, 20)]
#San4ez has given traditional, correct, simple, and beautiful answer.
In the spirit of the one-liner though, here's a technique for putting it all in one statement. The core idea is to use a nested for-loop to pre-evaluate subexpressions:
result = [(t, result) for result in [some_very_complex_computation(y)] for t in z]
If that blows your mind, you could just use a semicolon to put multiple statements on one line:
result = some_very_complex_computation(y); x = [(t, result) for t in z]
It can't know whether the function has side effects and changes from run to run, so you have to move the call out of the list comprehension manually.

Categories

Resources