How to create `if in` function check? - python

I have been investigating and found out that using if in is the fastest compare to ->
benchmark
and I have been trying to create a function where I can pass arguments on what path I want the if in will follow e.g.
def main():
d = {"foo": "spam"}
if "bar" in d:
if "eggs" in d["bar"]:
d["bar"]["eggs"]
else:
{}
else:
{}
But instead of having a long code, I was trying to do a function where I can pass argument e.g. get_path(json_data, 'foo', 'eggs') which would try to do something similar to the code above and return if value found else return empty.
My question is how can I create a function where we can pass argument to do the if in checks and return the value if it's found?

You could pass your keys as tuple/list:
def main(data, keys):
for k in keys:
if k not in data:
return {}
data = data[k]
return data
d = {"foo": "spam", "bar": {"eggs": "HAM!"}}
print(main(d, ('bar', 'eggs')))
Out:
HAM!

This is a nice little problem that has a fairly easy solution as long as everything is dicts:
def get_path(data, *path):
node = data
for step in path:
if step in node:
node = node[step]
else:
return {} # None might be more appropriate here
return node
note that it won't work quite right if you encounter a list along the way: although lists support [] and they support in, in means something different to them ("is this value found", rather than "is this key found"), so the test generally won't succeed.

Related

python dictionary getter with default value not behaving as expected [duplicate]

I am trying to provide a function as the default argument for the dictionary's get function, like this
def run():
print "RUNNING"
test = {'store':1}
test.get('store', run())
However, when this is run, it displays the following output:
RUNNING
1
so my question is, as the title says, is there a way to provide a callable as the default value for the get method without it being called if the key exists?
Another option, assuming you don't intend to store falsy values in your dictionary:
test.get('store') or run()
In python, the or operator does not evaluate arguments that are not needed (it short-circuits)
If you do need to support falsy values, then you can use get_or_run(test, 'store', run) where:
def get_or_run(d, k, f):
sentinel = object() # guaranteed not to be in d
v = d.get(k, sentinel)
return f() if v is sentinel else v
See the discussion in the answers and comments of dict.get() method returns a pointer. You have to break it into two steps.
Your options are:
Use a defaultdict with the callable if you always want that value as the default, and want to store it in the dict.
Use a conditional expression:
item = test['store'] if 'store' in test else run()
Use try / except:
try:
item = test['store']
except KeyError:
item = run()
Use get:
item = test.get('store')
if item is None:
item = run()
And variations on those themes.
glglgl shows a way to subclass defaultdict, you can also just subclass dict for some situations:
def run():
print "RUNNING"
return 1
class dict_nokeyerror(dict):
def __missing__(self, key):
return run()
test = dict_nokeyerror()
print test['a']
# RUNNING
# 1
Subclassing only really makes sense if you always want the dict to have some nonstandard behavior; if you generally want it to behave like a normal dict and just want a lazy get in one place, use one of my methods 2-4.
I suppose you want to have the callable applied only if the key does not exist.
There are several approaches to do so.
One would be to use a defaultdict, which calls run() if key is missing.
from collections import defaultdict
def run():
print "RUNNING"
test = {'store':1}
test.get('store', run())
test = defaultdict(run, store=1) # provides a value for store
test['store'] # gets 1
test['runthatstuff'] # gets None
Another, rather ugly one, one would be to only save callables in the dict which return the apropriate value.
test = {'store': lambda:1}
test.get('store', run)() # -> 1
test.get('runrun', run)() # -> None, prints "RUNNING".
If you want to have the return value depend on the missing key, you have to subclass defaultdict:
class mydefaultdict(defaultdict):
def __missing__(self, key):
val = self[key] = self.default_factory(key)
return val
d = mydefaultdict(lambda k: k*k)
d[10] # yields 100
#mydefaultdict # decorators are fine
def d2(key):
return -key
d2[5] # yields -5
And if you want not to add this value to the dict for the next call, you have a
def __missing__(self, key): return self.default_factory(key)
instead which calls the default factory every time a key: value pair was not explicitly added.
If you only know what the callable is likely to be at he get call site you could subclass dict something like this
class MyDict(dict):
def get_callable(self,key,func,*args,**kwargs):
'''Like ordinary get but uses a callable to
generate the default value'''
if key not in self:
val = func(*args,**kwargs)
else:
val = self[key]
return val
This can then be used like so:-
>>> d = MyDict()
>>> d.get_callable(1,complex,2,3)
(2+3j)
>>> d[1] = 2
>>> d.get_callable(1,complex,2,3)
2
>>> def run(): print "run"
>>> repr(d.get_callable(1,run))
'2'
>>> repr(d.get_callable(2,run))
run
'None'
This is probably most useful when the callable is expensive to compute.
I have a util directory in my project with qt.py, general.py, geom.py, etc. In general.py I have a bunch of python tools like the one you need:
# Use whenever you need a lambda default
def dictGet(dict_, key, default):
if key not in dict_:
return default()
return dict_[key]
Add *args, **kwargs if you want to support calling default more than once with differing args:
def dictGet(dict_, key, default, *args, **kwargs):
if key not in dict_:
return default(*args, **kwargs)
return dict_[key]
Here's what I use:
def lazy_get(d, k, f):
return d[k] if k in d else f(k)
The fallback function f takes the key as an argument, e.g.
lazy_get({'a': 13}, 'a', lambda k: k) # --> 13
lazy_get({'a': 13}, 'b', lambda k: k) # --> 'b'
You would obviously use a more meaningful fallback function, but this illustrates the flexibility of lazy_get.
Here's what the function looks like with type annotation:
from typing import Callable, Mapping, TypeVar
K = TypeVar('K')
V = TypeVar('V')
def lazy_get(d: Mapping[K, V], k: K, f: Callable[[K], V]) -> V:
return d[k] if k in d else f(k)

A memoized function that takes a tuple of strings to return an integer?

Suppose I have arrays of tuples like so:
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
I am trying to turn these arrays into numerical vectors with each dimension representing a feature.
So the expected output we be something like:
amod = [1, 0, 1] # or [1, 1, 1]
bmod = [1, 1, 2] # or [1, 2, 2]
So the vector that gets created is dependent on what it has seen before (i.e rectangle is still coded as 1 but the new value 'large' gets coded as a next step up as 2).
I think I could use some combination of yield and a memoize function to help me with this. This is what I've tried so far:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
#memoize
def verbal_to_value(tup):
u = 1
if tup[0] == 'shape':
yield u
u += 1
if tup[0] == 'fill':
yield u
u += 1
if tup[0] == 'size':
yield u
u += 1
But I keep getting this error:
TypeError: 'NoneType' object is not callable
Is there a way I can create this function that has a memory of what it has seen? Bonus points if it could add keys dynamically so I don't have to hardcode things like 'shape' or 'fill'.
First off: this is my preferred implementation of the memoize
decorator, mostly because of speed ...
def memoize(f):
class memodict(dict):
__slots__ = ()
def __missing__(self, key):
self[key] = ret = f(key)
return ret
return memodict().__getitem__
except for some a few edge cases it has the same effect as yours:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
#else:
# pass
return memo[x]
return helper
but is somewhat faster because the if x not in memo: happens in
native code instead of in python. To understand it you merely need
to know that under normal circumstances: to interpret adict[item]
python calls adict.__getitem__(key), if adict doesn't contain key,
__getitem__() calls adict.__missing__(key) so we can leverage the
python magic methods protocols for our gain...
#This the first idea I had how I would implement your
#verbal_to_value() using memoization:
from collections import defaultdict
work=defaultdict(set)
#memoize
def verbal_to_value(kv):
k, v = kv
aset = work[k] #work creates a new set, if not already created.
aset.add(v) #add value if not already added
return len(aset)
including the memoize decorator, that's 15 lines of code...
#test suite:
def vectorize(alist):
return [verbal_to_value(kv) for kv in alist]
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
print (vectorize(a)) #shows [1,1,1]
print (vectorize(b)) #shows [1,2,2]
defaultdict is a powerful object that has almost the same logic
as memoize: a standard dictionary in every way, except that when the
lookup fails, it runs the callback function to create the missing
value. In our case set()
Unfortunately this problem requires either access to the tupple that
is being used as the key, or to the dictionary state itself. With the
result that we cannot just write a simple function for .default_factory
But we can write a new object based on the memoize/defaultdict pattern:
#This how I would implement your verbal_to_value without
#memoization, though the worker class is so similar to #memoize,
#that it's easy to see why memoize is a good pattern to work from:
class sloter(dict):
__slots__ = ()
def __missing__(self,key):
self[key] = ret = len(self) + 1
#this + 1 bothers me, why can't these vectors be 0 based? ;)
return ret
from collections import defaultdict
work2 = defaultdict(sloter)
def verbal_to_value2(kv):
k, v = kv
return work2[k][v]
#~10 lines of code?
#test suite2:
def vectorize2(alist):
return [verbal_to_value2(kv) for kv in alist]
print (vectorize2(a)) #shows [1,1,1]
print (vectorize2(b)) #shows [1,2,2]
You might have seen something like sloter before, because it's
sometimes used for exactly this sort of situation. Converting member
names to numbers and back. Because of this, we have the advantage of
being able to reverse things like this:
def unvectorize2(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work2[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize2(vectorize2(a))))
print (list(unvectorize2(vectorize2(b))))
But I saw those yields in your original post, and they've got me
thinking... what if there was a memoize / defaultdict like object
that could take a generator instead of a function and knew to just
advance the generator rather than calling it. Then I realized ...
that yes generators come with a callable called __next__() which
meant that we didn't need a new defaultdict implementation, just a
careful extraction of the correct member funtion...
def count(start=0): #same as: from itertools import count
while True:
yield start
start += 1
#so we could get the exact same behavior as above, (except faster)
#by saying:
sloter3=lambda :defaultdict(count(1).__next__)
#and then
work3 = defaultdict(sloter3)
#or just:
work3 = defaultdict(lambda :defaultdict(count(1).__next__))
#which yes, is a bit of a mindwarp if you've never needed to do that
#before.
#the outer defaultdict interprets the first item. Every time a new
#first item is received, the lambda is called, which creates a new
#count() generator (starting from 1), and passes it's .__next__ method
#to a new inner defaultdict.
def verbal_to_value3(kv):
k, v = kv
return work3[k][v]
#you *could* call that 8 lines of code, but we managed to use
#defaultdict twice, and didn't need to define it, so I wouldn't call
#it 'less complex' or anything.
#test suite3:
def vectorize3(alist):
return [verbal_to_value3(kv) for kv in alist]
print (vectorize3(a)) #shows [1,1,1]
print (vectorize3(b)) #shows [1,2,2]
#so yes, that can also work.
#and since the internal state in `work3` is stored in the exact same
#format, it be accessed the same way as `work2` to reconstruct input
#from output.
def unvectorize3(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize3(vectorize3(a))))
print (list(unvectorize3(vectorize3(b))))
Final comments:
Each of these implementations suffer from storing state in a global
variable. Which I find anti-aesthetic but depending on what you're
planning to do with that vector later, that might be a feature. As I
demonstrated.
Edit:
Another day of meditating on this, and the sorts of situations where I might need it,
I think that I'd encapsulate this feature like this:
from collections import defaultdict
from itertools import count
class slotter4:
def __init__(self):
#keep track what order we expect to see keys
self.pattern = defaultdict(count(1).__next__)
#keep track of what values we've seen and what number we've assigned to mean them.
self.work = defaultdict(lambda :defaultdict(count(1).__next__))
def slot(self, kv, i=False):
"""used to be named verbal_to_value"""
k, v = kv
if i and i != self.pattern[k]:# keep track of order we saw initial keys
raise ValueError("Input fields out of order")
#in theory we could ignore this error, and just know
#that we're going to default to the field order we saw
#first. Or we could just not keep track, which might be
#required, if our code runs to slow, but then we cannot
#make pattern optional in .unvectorize()
return self.work[k][v]
def vectorize(self, alist):
return [self.slot(kv, i) for i, kv in enumerate(alist,1)]
#if we're not keeping track of field pattern, we could do this instead
#return [self.work[k][v] for k, v in alist]
def unvectorize(self, a_vector, pattern=None):
if pattern is None:
pattern = [k for k,v in sorted(self.pattern.items(), key=lambda a:a[1])]
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
return [(pattern[index], reverser[index][vect])
for index, vect in enumerate(a_vector)]
#test suite4:
s = slotter4()
if __name__=='__main__':
Av = s.vectorize(a)
Bv = s.vectorize(b)
print (Av) #shows [1,1,1]
print (Bv) #shows [1,2,2]
print (s.unvectorize(Av))#shows a
print (s.unvectorize(Bv))#shows b
else:
#run the test silently, and only complain if something has broken
assert s.unvectorize(s.vectorize(a))==a
assert s.unvectorize(s.vectorize(b))==b
Good luck out there!
Not the best approach, but may help you to figure out a better solution
class Shape:
counter = {}
def to_tuple(self, tuples):
self.tuples = tuples
self._add()
l = []
for i,v in self.tuples:
l.append(self.counter[i][v])
return l
def _add(self):
for i,v in self.tuples:
if i in self.counter.keys():
if v not in self.counter[i]:
self.counter[i][v] = max(self.counter[i].values()) +1
else:
self.counter[i] = {v: 0}
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
s = Shape()
s.to_tuple(a)
s.to_tuple(b)

Adding items to a list if it's not a function

I'm trying to write a function right now, and its purpose is to go through an object's __dict__ and add an item to a dictionary if the item is not a function.
Here is my code:
def dict_into_list(self):
result = {}
for each_key,each_item in self.__dict__.items():
if inspect.isfunction(each_key):
continue
else:
result[each_key] = each_item
return result
If I'm not mistaken, inspect.isfunction is supposed to recognize lambdas as functions as well, correct? However, if I write
c = some_object(3)
c.whatever = lambda x : x*3
then my function still includes the lambda. Can somebody explain why this is?
For example, if I have a class like this:
class WhateverObject:
def __init__(self,value):
self._value = value
def blahblah(self):
print('hello')
a = WhateverObject(5)
So if I say print(a.__dict__), it should give back {_value:5}
You are actually checking if each_key is a function, which most likely is not. You actually have to check the value, like this
if inspect.isfunction(each_item):
You can confirm this, by including a print, like this
def dict_into_list(self):
result = {}
for each_key, each_item in self.__dict__.items():
print(type(each_key), type(each_item))
if inspect.isfunction(each_item) == False:
result[each_key] = each_item
return result
Also, you can write your code with dictionary comprehension, like this
def dict_into_list(self):
return {key: value for key, value in self.__dict__.items()
if not inspect.isfunction(value)}
I can think of an easy way to find the variables of an object through the dir and callable methods of python instead of inspect module.
{var:self.var for var in dir(self) if not callable(getattr(self, var))}
Please note that this indeed assumes that you have not overrided __getattr__ method of the class to do something other than getting the attributes.

Is there an "infinite dictionary" in Python?

Is there something like an "infinite dictionary" in Python?
More precisely, is there something where
- i can put in values like in a dictionary,
- but maybe also a function which tells me how to map a key to a value,
- and maybe also something that maps a key to a (finite) set of keys and then gives the corresponding value?
Formulated in another way, what I want to have is the following "thing":
I initialize it in a way (give values, functions, whatever) and then it just gives me for each key a value (on request).
You will want to create a class with the special method __getitem__(self,key) that returns the appropriate value for that key.
What you need is called a "function".
Now, on a less sarcastic note: I don't know exactly what you are trying to achieve, but here's an example:
You want a piece of code that returns the nth element in an arithmetic progression. You can do it this way with functions:
def progression(first_element, ratio):
def nth_element(n):
return n*ratio + first_element
return nth_element
my_progression = progression(2, 32)
print my_progression(17) # prints 546
This can be extended if, for example, you need a function that retains state.
Hope this helps
If you want normal behaviour for existing keys, and special behavior for non-existing keys, there's the __missing__ method that's called for missing keys.
class funny_dict(dict):
def __missing__(self, key):
return "funny" * key
d = funny_dict()
d[1] = "asdf"
d[3] = 3.14
for i in range(5):
print(i, d[i])
print(d)
Output:
0
1 asdf
2 funnyfunny
3 3.14
4 funnyfunnyfunnyfunny
{1: 'asdf', 3: 3.14}
An easy way to do this would be to use a function object for both use cases. If you want to use a key-value function, you just just use it directly as a reference. To adapt an ordinary dictionary to this interface, you can wrap it in a lambda block. Like so:
# Use function as dictionary
def dict_func(key):
return key * key
dictionary = dict_func
print dictionary(2) # prints 4
# Use normal dictionary with the same interface
normal_dict = {1: 1, 2: 4, 3: 9}
dictionary = lambda(key): normal_dict[key]
print dictionary(2) # also prints 4
# Lambda functions store references to the variables they use,
# so this works too:
def fn_dict(normal_dict):
return lambda(key): normal_dict[key]
dictionary = fn_dict({1: 1, 2: 4, 3: 9})
print dictionary(3) # prints 9
I think you want something like this, where you dict act like a normal dictionary but for special keys you want to change the behavior e.g.
class InfiniteDict(dict):
def __init__(self, *args, **kwargs):
self.key_funcs = kwargs.pop('key_funcs', [])
super(InfiniteDict, self).__init__(*args, **kwargs)
def __getitem__(self, key):
try:
return super(InfiniteDict, self).__getitem__(key)
except KeyError:
return self._get_value_from_functions(key)
def _get_value_from_functions(self, key):
"""
go thru list of user defined functions and return first match
"""
for key_func in self.key_funcs:
try:
return key_func(key)
except KeyError:
pass
raise KeyError(key)
def double_even_int(key):
try:
if int(key)%2 == 0:
return int(key)*2
else:
raise KeyError(key)
except ValueError:
raise KeyError(key)
def tripple_odd_int(key):
try:
if int(key)%2 == 1:
return int(key)*3
else:
raise KeyError(key)
except ValueError:
raise KeyError(key)
inf = InfiniteDict(key_funcs=[double_even_int, tripple_odd_int])
inf['a'] = 'A'
print inf['a'], inf[1], inf['2']
output:
A 3 4

inserting into python dictionary

The default behavior for python dictionary is to create a new key in the dictionary if that key does not already exist. For example:
d = {}
d['did not exist before'] = 'now it does'
this is all well and good for most purposes, but what if I'd like python to do nothing if the key isn't already in the dictionary. In my situation:
for x in exceptions:
if masterlist.has_key(x):
masterlist[x] = False
in other words, i don't want some incorrect elements in exceptions to corrupt my masterlist. Is this as simple as it gets? it FEELS like I should be able to do this in one line inside the for loop (i.e., without explicitly checking that x is a key of masterlist)
UPDATE:
To me, my question is asking about the lack of a parallel between a list and a dict. For example:
l = []
l[0] = 2 #fails
l.append(2) #works
with the subclassing answer, you could modify the dictionary (maybe "safe_dict" or "explicit_dict" to do something similar:
d = {}
d['a'] = '1' #would fail in my world
d.insert('a','1') #what my world is missing
You could use .update:
masterlist.update((x, False) for x in exceptions if masterlist.has_key(x))
You can inherit a dict class, override it's __setitem__ to check for existance of key (or do the same with monkey-patching only one instance).
Sample class:
class a(dict):
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
dict.__setitem__(self, 'a', 'b')
def __setitem__(self, key, value):
if self.has_key(key):
dict.__setitem__(self, key, value)
a = a()
print a['a'] # prints 'b'
a['c'] = 'd'
# print a['c'] - would fail
a['a'] = 'e'
print a['a'] # prints 'e'
You could also use some function to make setting values without checking for existence simpler.
However, I though it would be shorter... Don't use it unless you need it in many places.
You can also use in instead of has_key, which is a little nicer.
for x in exceptions:
if x in masterlist:
masterlist[x] = False
But I don't see the issue with having an if statement for this purpose.
For long lists try to use the & operator with set() function embraced with ():
for x in (set(exceptions) & set(masterlist)):
masterlist[x] = False
#or masterlist[x] = exceptions[x]
It'll improve the reading and the iterations at the same time by reading the masterlist's keys only once.

Categories

Resources