Handle nested fields with conversion types in string with string.Formatter - python

Update 2
Alright, my answer to this question is not a complete solution to what I originally wanted but it's ok for simpler things like filename templating (what I originally intended to use this for). I have yet to come up with a solution for recursive templating. It might not matter to me though as I have reevaluated what I really need. Though it's possible I'll need bigger guns in the future, but then I'll probably just choose another more advanced templating engine instead of reinventing the tire.
Update
Ok I realize now string.Template probably is the better way to do this. I'll answer my own question when I have a working example.
I want to accomplish formatting strings by grouping keys and arbitrary text together in a nesting manner, like so
# conversions (!):
# u = upper case
# l = lower case
# c = capital case
# t = title case
fmt = RecursiveNamespaceFormatter(globals())
greeting = 'hello'
person = 'foreName surName'
world = 'WORLD'
sample = 'WELL {greeting!u} {super {person!t}, {tHiS iS tHe {world!t}!l}!c}!'
print(fmt.format(sample))
# output: WELL HELLO Super Forename Surname, this is the World!
I've subclassed string.Formatter to populate the nested fields which I retrieve with regex, and it works fine, except for the fields with a conversion type which doesn't get converted.
import re
from string import Formatter
class RecursiveNamespaceFormatter(Formatter):
def __init__(self, namespace={}):
Formatter.__init__(self)
self.namespace = namespace
def vformat(self, format_string, *args, **kwargs):
def func(i):
i = i.group().strip('{}')
return self.get_value(i,(),{})
format_string = re.sub('\{(?:[^}{]*)\}', func, format_string)
try:
return super().vformat(format_string, args, kwargs)
except ValueError:
return self.vformat(format_string)
def get_value(self, key, args, kwds):
if isinstance(key, str):
try:
# Check explicitly passed arguments first
return kwds[key]
except KeyError:
return self.namespace.get(key, key) # return key if not found (e.g. key == "this is the World")
else:
super().get_value(key, args, kwds)
def convert_field(self, value, conversion):
if conversion == "u":
return str(value).upper()
elif conversion == "l":
return str(value).lower()
elif conversion == "c":
return str(value).capitalize()
elif conversion == "t":
return str(value).title()
# Do the default conversion or raise error if no matching conversion found
return super().convert_field(value, conversion)
# output: WELL hello!u super foreName surName!t, tHiS iS tHe WORLD!t!l!c!
What am I missing? Is there a better way to do this?

Recursion is a complicated thing with this, especially with the limitations of python's re module. Before I tackled on with string.Template, I experimented with looping through the string and stacking all relevant indexes, to order each nested field in hierarchy. Maybe a combination of the two could work, I'm not sure.
Here's however a working, non-recursive example:
from string import Template, _sentinel_dict
class MyTemplate(Template):
delimiter = '$'
pattern = '\$(?:(?P<escaped>\$)|\{(?P<braced>[\w]+)(?:\.(?P<braced_func>\w+)\(\))*\}|(?P<named>(?:[\w]+))(?:\.(?P<named_func>\w+)\(\))*|(?P<invalid>))'
def substitute(self, mapping=_sentinel_dict, **kws):
if mapping is _sentinel_dict:
mapping = kws
elif kws:
mapping = _ChainMap(kws, mapping)
def convert(mo):
named = mapping.get(mo.group('named'), mapping.get(mo.group('braced')))
func = mo.group('named_func') or mo.group('braced_func') # i.e. $var.func() or ${var.func()}
if named is not None:
if func is not None:
# if named doesn't contain func, convert it to str and try again.
callable_named = getattr(named, func, getattr(str(named), func, None))
if callable_named:
return str(callable_named())
return str(named)
if mo.group('escaped') is not None:
return self.delimiter
if mo.group('invalid') is not None:
self._invalid(mo)
if named is not None:
raise ValueError('Unrecognized named group in pattern',
self.pattern)
return self.pattern.sub(convert, self.template)
sample1 = 'WELL $greeting.upper() super$person.title(), tHiS iS tHe $world.title().lower().capitalize()!'
S = MyTemplate(sample1)
print(S.substitute(**{'greeting': 'hello', 'person': 'foreName surName', 'world': 'world'}))
# output: WELL HELLO super Forename Surname, tHiS iS tHe World!
sample2 = 'testing${äää.capitalize()}.upper()ing $NOT_DECLARED.upper() $greeting '
sample2 += '$NOT_DECLARED_EITHER ASDF$world.upper().lower()ASDF'
S = MyTemplate(sample2)
print(S.substitute(**{
'some_var': 'some_value',
'äää': 'TEST',
'greeting': 'talofa',
'person': 'foreName surName',
'world': 'världen'
}))
# output: testingTest.upper()ing talofa ASDFvärldenASDF
sample3 = 'a=$a.upper() b=$b.bit_length() c=$c.bit_length() d=$d.upper()'
S = MyTemplate(sample3)
print(S.substitute(**{'a':1, 'b':'two', 'c': 3, 'd': 'four'}))
# output: a=1 b=two c=2 d=FOUR
As you can see, $var and ${var} works as expected, but the fields can also handle type methods. If the method is not found, it converts the value to str and checks again.
The methods can't take any arguments though. It also only catches the last method so chaining doesn't work either, which I believe is because re do not allow multiple groups to use the same name (the regex module does however).
With some tweaking of the regex pattern and some extra logic in convert both these things should be easily fixed.
MyTemplate.substitute works like MyTemplate.safe_substitute by not throwing exceptions on missing keys or fields.

Related

How can I accept one variable or multple when using a function?

I'm looking to create a search function similar to my rent_book fuction that allows me to search by first name, second name or title or any combination of the three. So I could maybe search for first name "George" and title "Animal Farm" or just title "Animal Farm" and receive the same result.
Books are stored in a list of dictionaries this is the dict struct and the rent_book function. I could do a convoluted nest of ifs but I'm sure there's a better way.
book = {
"fname": fname,
"sname": sname,
"title": title,
"avail": True
}
def rent_book(self, fname, sname, title):
# if is_return is False:
for x in self.lstBooks:
if x['fname'] == fname and x['sname'] == sname and x['title'] == title and x['avail'] is True:
x['avail'] = False
return True
return False
Thanks
Since your arguments are the same as the dictionary keys you're matching, you could just use **kwargs and iterate over the kwargs:
def rent_book(self, **kwargs):
# if is_return:
# return False
if not kwargs:
raise KeyError("Must search on at least one of fname, sname, or title.")
for x in self.lstBooks:
if not (x['avail'] and all(x[k] == v for k, v in kwargs.items())):
continue
x['avail'] = False
return True
return False
Note that the function will implicitly raise KeyError if it's called with any invalid keys (the x[k] will raise it), and there's an explicit raise KeyError to guard against the caller accidentally not providing any kwargs at all, since otherwise it would just return the first book in lstBooks.
(Yes, pedants, they can still call it with avail=True.)
I am not sure If I understood Your question correctly,
But If I am correct, You want the function to work appropriately handling the following cases:
if all "title","fname","sname" are given.
if only one among the 3 is given.
If any pair of 2 among the 3 is given.
If that is the case You can use default Parameters in the function for all the 3.
def rent_book(self, fname="", sname="", title=""):
# if is_return is False:
for x in self.lstBooks:
if (len(fname) and x['fname'] == fname) and (len(sname) and x['sname'] == sname) and (len(title) and x['title'] == title) and x['avail'] is True:
x['avail'] = False
return True
return False
The line
(len(fname) and x['fname'] == fname)
makes sure that You only check the condition if fname is explicitly provided in as parameter.
So, In all the above scenarios mentioned above,
You can call the same function with only optional parameters
(Dont forget to specify the parameter name while calling else it will always consider it the value for the first parameter by default!!)
The first thing to do here is think about having a Book class. Initially it needs to support sname, fname, title and avail (a flag to indicate whether or not the book is available to rent).
We could then construct a Book instance with any combination of these attributes or even none.
So let's start with this:
class Book:
klist = ['sname', 'fname', 'title']
def __init__(self, **kwargs):
self.vars = kwargs
for k in Book.klist:
self.vars.setdefault(k, None)
self.vars.setdefault('avail', True)
#property
def avail(self):
return self['avail']
#avail.setter
def avail(self, v):
self.vars['avail'] = v
def __getitem__(self, item):
return self.vars.get(item, None)
def __repr__(self):
return f'sname={self["sname"]} fname={self["fname"]} title={self["title"]} avail={self.avail}'
Defining getitem keeps the code neater and safer. The repr is optional - it just helps to see what's going on should we ever want to print an instance of this class.
Let's create a couple of Book instances and put them in a list.
booklist = [Book(sname='Orwell', fname='George', title='Animal Farm'), Book(sname='Fleming', fname='Ian', title='Casino Royale')]
Now we want to rent a book. We can provide any one or all of the main attributes. If there's a match for all of the attributes that have been passed and if that book is available, we'll mark it as unavailable and return True. If nothing matches or the book isn't available we return False.
def rent_book(**kwargs):
for book in booklist:
if book.avail:
for k, v in kwargs.items():
if book[k] is not None and v != book[k]:
break
else:
book.avail = False
return True
return False
Now let's see what happens:
print(rent_book(title='Casino Royale'))
print(rent_book(title='Casino Royale'))
print(rent_book(sname='Orwell', title='Animal Farm'))
Output:
True # the book title matches and it's available
False # the book title matches but it's no longer available
True # both the surname and title match so it's available

Left truncate using python 3.5 str.format?

Q: Is is possible to create a format string using Python 3.5's string formatting syntax to left truncate?
Basically what I want to do is take a git SHA:
"c1e33f6717b9d0125b53688d315aff9cf8dd9977"
And using only a format string, get the display only the right 8 chars:
"f8dd9977"
Things Ive tried:
Invalid Syntax
>>> "{foo[-8:]}".format(foo="c1e33f6717b9d0125b53688d315aff9cf8dd9977")
>>> "{foo[-8]}".format(foo="c1e33f6717b9d0125b53688d315aff9cf8dd9977")
>>> "{:8.-8}".format("c1e33f6717b9d0125b53688d315aff9cf8dd9977")
Wrong Result
### Results in first 8 not last 8.
>>> "{:8.8}".format("c1e33f6717b9d0125b53688d315aff9cf8dd9977")
Works but inflexible and cumbersome
### solution requires that bar is always length of 40.
>>> bar="c1e33f6717b9d0125b53688d315aff9cf8dd9977"
>>> "{foo[32]}{foo[33]}{foo[34]}{foo[35]}{foo[36]}{foo[37]}{foo[38]}{foo[39]}".format(foo=bar)
A similar question was asked, but never answered. However mine differs in that I am limited to using only format string, I don't have the ability to change the range of the input param. This means that the following is an unacceptable solution:
>>> bar="c1e33f6717b9d0125b53688d315aff9cf8dd9977"
>>> "{0}".format(bar[-8:])
One more aspect I should clarify... the above explains the simplest form of the problem. In actual context, the problem is expressed more correctly as:
>>> import os
>>> "foo {git_sha}".format(**os.environ)
Where I want to left_truncate "git_sha" environment variable. Admittedly this is a tad more complex than simplest form, but if I can solve the simplest - I can find a way to solve the more complex.
So here is my solution, with thanks to #JacquesGaudin and folks on #Python for providing much guidance...
class MyStr(object):
"""Additional format string options."""
def __init__(self, obj):
super(MyStr, self).__init__()
self.obj = obj
def __format__(self, spec):
if spec.startswith("ltrunc."):
offset = int(spec[7:])
return self.obj[offset:]
else:
return self.obj.__format__(spec)
So this works when doing this:
>>> f = {k: MyStr(v) for k, v in os.environ.items()}
>>> "{PATH:ltrunc.-8}".format(**f)
Subclassing str and overriding the __format__ method is an option:
class CustomStr(str):
def __format__(self, spec):
if spec == 'trunc_left':
return self[-8:]
else:
return super().__format__(spec)
git_sha = 'c1e33f6717b9d0125b53688d315aff9cf8dd9977'
s = CustomStr(git_sha)
print('{:trunc_left}'.format(s))
Better though, you can create a custom Formatter which inherits from string.Formatter and will provide a format method. By doing this, you can override a number of methods used in the process of formatting strings. In your case, you want to override format_field:
from string import Formatter
class CustomFormatter(Formatter):
def format_field(self, value, format_spec):
if format_spec.startswith('trunc_left.'):
char_number = int(format_spec[len('trunc_left.'):])
return value[-char_number:]
return super().format_field(value, format_spec)
environ = {'git_sha': 'c1e33f6717b9d0125b53688d315aff9cf8dd9977'}
fmt = CustomFormatter()
print(fmt.format('{git_sha:trunc_left.8}', **environ))
Depending on the usage, you could put this in a context manager and temporarily shadow the builtin format function:
from string import Formatter
class CustomFormat:
class CustomFormatter(Formatter):
def format_field(self, value, format_spec):
if format_spec.startswith('trunc_left.'):
char_number = int(format_spec[len('trunc_left.'):])
return value[-char_number:]
return super().format_field(value, format_spec)
def __init__(self):
self.custom_formatter = self.CustomFormatter()
def __enter__(self):
self.builtin_format = format
return self.custom_formatter.format
def __exit__(self, exc_type, exc_value, traceback):
# make sure global format is set back to the original
global format
format = self.builtin_format
environ = {'git_sha': 'c1e33f6717b9d0125b53688d315aff9cf8dd9977'}
with CustomFormat() as format:
# Inside this context, format is our custom formatter's method
print(format('{git_sha:trunc_left.8}', **environ))
print(format) # checking that format is now the builtin function

Refactoring if statements in python

I would like to consult some piece of code with you. I have:
if tuple_type == Operation.START_SERVER:
dictionary = ServersDictionary()
dictionary.start(some_param)
elif tuple_type == Operation.STOP_SERVER:
dictionary = ServersDictionary()
dictionary.stop(some_param)
(...)
elif tuple_type == Operation.START_APP:
dictionary = AppsDictionary()
dictionary.start(some_param)
elif ...
(....)
And there I have 27 if / elifs. Normally, I would go into map - function dispatcher, but after every if / elif I have two lines of code with same dictionary reference. Would you suggest me some clean solution to replace those ugly constructions?
Creating 27 classes for applying polymorphism or 27 functions doesn't sound good... what do you think?
You're right, a mapping is the way to go. Use getattr to access a method from its name:
mapping = {Operation.START_SERVER: (ServerDictionary, 'start', some_param),
Operation.STOP_SERVER: (ServerDictionary, 'stop', some_param),
Operation.START_APP: (AppsDictionary, 'start', some_param)}
...
cls, method, param = mapping[tuple_type]
dictionary = cls()
getattr(dictionary, method)(param)
You can enclose the meta info into your enums, if that is ok for you client code, meaning that you own the enums. Here is an example:
class Operation(Enum):
START_SERVER = (0, "start", ServersDictionary)
STOP_SERVER = (1, "stop", ServersDictionary)
START_APP = (1, "start", AppsDictionary)
And then have a single function to handle your operations:
def handle_operation(operation, some_param):
klass = operation.klass
dictionary = klass()
fn = getattr(dictionary, operation.value)
fn(some_param)
This is assuming you are using the Enum you had in one of your questions. In that case, you will need to add one line there:
class Enum(object):
__metaclass__ = EnumMeta
def __init__(self, value):
super(Enum, self).__init__()
self.value, self.repr, self.klass = value[0], value[1], value[2]
def __repr__(self):
return str(self.repr)
Then you will not need any case checks, simply:
handle_operation(tuple_type)
Maybe you can represent the operation with a dict or tupple, like
op = {'target': 'Servers', 'action': 'start', 'params': (arg1, arg2)}
then you can access it like
obj = globals()[op['target']+'Dictionary']()
getattr(obj, op['action'])(*op['params'])

How to intercept a specific tuple lookup in python

I'm wondering how could one create a program to detect the following cases in the code, when comparing a variable to hardcoded values, instead of using enumeration, dynamically?
class AccountType:
BBAN = '000'
IBAN = '001'
UBAN = '002'
LBAN = '003'
I would like the code to report (drop a warning into the log) in the following case:
payee_account_type = self.get_payee_account_type(rc) # '001' for ex.
if payee_account_type in ('001', '002'): # Report on unsafe lookup
print 'okay, but not sure about the codes, man'
To encourage people to use the following approach:
payee_account_type = self.get_payee_account_type(rc)
if payee_account_type in (AccountType.IBAN, AccountType.UBAN):
print 'do this for sure'
Which is much safer.
It's not a problem to verify the == and != checks like below:
if payee_account_type == '001':
print 'codes again'
By wrapping payee_account_type into a class, with the following __eq__ implemented:
class Variant:
def __init__(self, value):
self._value = value
def get_value(self):
return self._value
class AccountType:
BBAN = Variant('000')
IBAN = Variant('001')
UBAN = Variant('002')
LBAN = Variant('003')
class AccountTypeWrapper(object):
def __init__(self, account_type):
self._account_type = account_type
def __eq__(self, other):
if isinstance(other, Variant):
# Safe usage
return self._account_type == other.get_value()
# The value is hardcoded
log.warning('Unsafe comparison. Use proper enumeration object')
return self._account_type == other
But what to do with tuple lookups?
I know, I could create a convention method wrapping the lookup, where the check can be done:
if IbanUtils.account_type_in(account_type, AccountType.IBAN, AccountType.UBAN):
pass
class IbanUtils(object):
def account_type_in(self, account_type, *types_to_check):
for type in types_to_check:
if not isinstance(type, Variant):
log.warning('Unsafe usage')
return account_type in types_to_check
But it's not an option for me, because I have a lot of legacy code I cannot touch, but still need to report on.

Python recursive setattr()-like function for working with nested dictionaries [duplicate]

This question already has answers here:
Is it possible to index nested lists using tuples in python?
(7 answers)
Closed 7 months ago.
There are a lot of good getattr()-like functions for parsing nested dictionary structures, such as:
Finding a key recursively in a dictionary
Suppose I have a python dictionary , many nests
https://gist.github.com/mittenchops/5664038
I would like to make a parallel setattr(). Essentially, given:
cmd = 'f[0].a'
val = 'whatever'
x = {"a":"stuff"}
I'd like to produce a function such that I can assign:
x['f'][0]['a'] = val
More or less, this would work the same way as:
setattr(x,'f[0].a',val)
to yield:
>>> x
{"a":"stuff","f":[{"a":"whatever"}]}
I'm currently calling it setByDot():
setByDot(x,'f[0].a',val)
One problem with this is that if a key in the middle doesn't exist, you need to check for and make an intermediate key if it doesn't exist---ie, for the above:
>>> x = {"a":"stuff"}
>>> x['f'][0]['a'] = val
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'f'
So, you first have to make:
>>> x['f']=[{}]
>>> x
{'a': 'stuff', 'f': [{}]}
>>> x['f'][0]['a']=val
>>> x
{'a': 'stuff', 'f': [{'a': 'whatever'}]}
Another is that keying for when the next item is a lists will be different than the keying when the next item is a string, ie:
>>> x = {"a":"stuff"}
>>> x['f']=['']
>>> x['f'][0]['a']=val
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
...fails because the assignment was for a null string instead of a null dict. The null dict will be the right assignment for every non-list in dict until the very last one---which may be a list, or a value.
A second problem, pointed out in the comments below by #TokenMacGuy, is that when you have to create a list that does not exist, you may have to create an awful lot of blank values. So,
setattr(x,'f[10].a',val)
---may mean the algorithm will have to make an intermediate like:
>>> x['f']=[{},{},{},{},{},{},{},{},{},{},{}]
>>> x['f'][10]['a']=val
to yield
>>> x
{"a":"stuff","f":[{},{},{},{},{},{},{},{},{},{},{"a":"whatever"}]}
such that this is the setter associated with the getter...
>>> getByDot(x,"f[10].a")
"whatever"
More importantly, the intermediates should /not/ overwrite values that already exist.
Below is the junky idea I have so far---I can identify the lists versus dicts and other data types, and create them where they do not exist. However, I don't see (a) where to put the recursive call, or (b) how to 'build' the deep object as I iterate through the list, and (c) how to distinguish the /probing/ I'm doing as I construct the deep object from the /setting/ I have to do when I reach the end of the stack.
def setByDot(obj,ref,newval):
ref = ref.replace("[",".[")
cmd = ref.split('.')
numkeys = len(cmd)
count = 0
for c in cmd:
count = count+1
while count < numkeys:
if c.find("["):
idstart = c.find("[")
numend = c.find("]")
try:
deep = obj[int(idstart+1:numend-1)]
except:
obj[int(idstart+1:numend-1)] = []
deep = obj[int(idstart+1:numend-1)]
else:
try:
deep = obj[c]
except:
if obj[c] isinstance(dict):
obj[c] = {}
else:
obj[c] = ''
deep = obj[c]
setByDot(deep,c,newval)
This seems very tricky because you kind of have to look-ahead to check the type of the /next/ object if you're making place-holders, and you have to look-behind to build a path up as you go.
UPDATE
I recently had this question answered, too, which might be relevant or helpful.
I have separated this out into two steps. In the first step, the query string is broken down into a series of instructions. This way the problem is decoupled, we can view the instructions before running them, and there is no need for recursive calls.
def build_instructions(obj, q):
"""
Breaks down a query string into a series of actionable instructions.
Each instruction is a (_type, arg) tuple.
arg -- The key used for the __getitem__ or __setitem__ call on
the current object.
_type -- Used to determine the data type for the value of
obj.__getitem__(arg)
If a key/index is missing, _type is used to initialize an empty value.
In this way _type provides the ability to
"""
arg = []
_type = None
instructions = []
for i, ch in enumerate(q):
if ch == "[":
# Begin list query
if _type is not None:
arg = "".join(arg)
if _type == list and arg.isalpha():
_type = dict
instructions.append((_type, arg))
_type, arg = None, []
_type = list
elif ch == ".":
# Begin dict query
if _type is not None:
arg = "".join(arg)
if _type == list and arg.isalpha():
_type = dict
instructions.append((_type, arg))
_type, arg = None, []
_type = dict
elif ch.isalnum():
if i == 0:
# Query begins with alphanum, assume dict access
_type = type(obj)
# Fill out args
arg.append(ch)
else:
TypeError("Unrecognized character: {}".format(ch))
if _type is not None:
# Finish up last query
instructions.append((_type, "".join(arg)))
return instructions
For your example
>>> x = {"a": "stuff"}
>>> print(build_instructions(x, "f[0].a"))
[(<type 'dict'>, 'f'), (<type 'list'>, '0'), (<type 'dict'>, 'a')]
The expected return value is simply the _type (first item) of the next tuple in the instructions. This is very important because it allows us to correctly initialize/reconstruct missing keys.
This means that our first instruction operates on a dict, either sets or gets the key 'f', and is expected to return a list. Similarly, our second instruction operates on a list, either sets or gets the index 0 and is expected to return a dict.
Now let's create our _setattr function. This gets the proper instructions and goes through them, creating key-value pairs as necessary. Finally, it also sets the val we give it.
def _setattr(obj, query, val):
"""
This is a special setattr function that will take in a string query,
interpret it, add the appropriate data structure to obj, and set val.
We only define two actions that are available in our query string:
.x -- dict.__setitem__(x, ...)
[x] -- list.__setitem__(x, ...) OR dict.__setitem__(x, ...)
the calling context determines how this is interpreted.
"""
instructions = build_instructions(obj, query)
for i, (_, arg) in enumerate(instructions[:-1]):
_type = instructions[i + 1][0]
obj = _set(obj, _type, arg)
_type, arg = instructions[-1]
_set(obj, _type, arg, val)
def _set(obj, _type, arg, val=None):
"""
Helper function for calling obj.__setitem__(arg, val or _type()).
"""
if val is not None:
# Time to set our value
_type = type(val)
if isinstance(obj, dict):
if arg not in obj:
# If key isn't in obj, initialize it with _type()
# or set it with val
obj[arg] = (_type() if val is None else val)
obj = obj[arg]
elif isinstance(obj, list):
n = len(obj)
arg = int(arg)
if n > arg:
obj[arg] = (_type() if val is None else val)
else:
# Need to amplify our list, initialize empty values with _type()
obj.extend([_type() for x in range(arg - n + 1)])
obj = obj[arg]
return obj
And just because we can, here's a _getattr function.
def _getattr(obj, query):
"""
Very similar to _setattr. Instead of setting attributes they will be
returned. As expected, an error will be raised if a __getitem__ call
fails.
"""
instructions = build_instructions(obj, query)
for i, (_, arg) in enumerate(instructions[:-1]):
_type = instructions[i + 1][0]
obj = _get(obj, _type, arg)
_type, arg = instructions[-1]
return _get(obj, _type, arg)
def _get(obj, _type, arg):
"""
Helper function for calling obj.__getitem__(arg).
"""
if isinstance(obj, dict):
obj = obj[arg]
elif isinstance(obj, list):
arg = int(arg)
obj = obj[arg]
return obj
In action:
>>> x = {"a": "stuff"}
>>> _setattr(x, "f[0].a", "test")
>>> print x
{'a': 'stuff', 'f': [{'a': 'test'}]}
>>> print _getattr(x, "f[0].a")
"test"
>>> x = ["one", "two"]
>>> _setattr(x, "3[0].a", "test")
>>> print x
['one', 'two', [], [{'a': 'test'}]]
>>> print _getattr(x, "3[0].a")
"test"
Now for some cool stuff. Unlike python, our _setattr function can set unhashable dict keys.
x = []
_setattr(x, "1.4", "asdf")
print x
[{}, {'4': 'asdf'}] # A list, which isn't hashable
>>> y = {"a": "stuff"}
>>> _setattr(y, "f[1.4]", "test") # We're indexing f with 1.4, which is a list!
>>> print y
{'a': 'stuff', 'f': [{}, {'4': 'test'}]}
>>> print _getattr(y, "f[1.4]") # Works for _getattr too
"test"
We aren't really using unhashable dict keys, but it looks like we are in our query language so who cares, right!
Finally, you can run multiple _setattr calls on the same object, just give it a try yourself.
>>> class D(dict):
... def __missing__(self, k):
... ret = self[k] = D()
... return ret
...
>>> x=D()
>>> x['f'][0]['a'] = 'whatever'
>>> x
{'f': {0: {'a': 'whatever'}}}
You can hack something together by fixing two problems:
List that automatically grows when accessed out of bounds (PaddedList)
A way to delay the decision of what to create (list of dict) until you accessed it by the first time (DictOrList)
So the code will look like this:
import collections
class PaddedList(list):
""" List that grows automatically up to the max index ever passed"""
def __init__(self, padding):
self.padding = padding
def __getitem__(self, key):
if isinstance(key, int) and len(self) <= key:
self.extend(self.padding() for i in xrange(key + 1 - len(self)))
return super(PaddedList, self).__getitem__(key)
class DictOrList(object):
""" Object proxy that delays the decision of being a List or Dict """
def __init__(self, parent):
self.parent = parent
def __getitem__(self, key):
# Type of the structure depends on the type of the key
if isinstance(key, int):
obj = PaddedList(MyDict)
else:
obj = MyDict()
# Update parent references with the selected object
parent_seq = (self.parent if isinstance(self.parent, dict)
else xrange(len(self.parent)))
for i in parent_seq:
if self == parent_seq[i]:
parent_seq[i] = obj
break
return obj[key]
class MyDict(collections.defaultdict):
def __missing__(self, key):
ret = self[key] = DictOrList(self)
return ret
def pprint_mydict(d):
""" Helper to print MyDict as dicts """
print d.__str__().replace('defaultdict(None, {', '{').replace('})', '}')
x = MyDict()
x['f'][0]['a'] = 'whatever'
y = MyDict()
y['f'][10]['a'] = 'whatever'
pprint_mydict(x)
pprint_mydict(y)
And the output of x and y will be:
{'f': [{'a': 'whatever'}]}
{'f': [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {'a': 'whatever'}]}
The trick consist on creating a defaultdict of objects that can be either a dict or a list depending how you access it.
So when you have the assigment x['f'][10]['a'] = 'whatever' it will work the following way:
Get X['f']. It wont exist so it will return a DictOrList object for the index 'f'
Get X['f'][10]. DictOrList.getitem will be called with an integer index. The DictOrList object will replace itself in the parent collection by a PaddedList
Access the 11th element in the PaddedList will grow it by 11 elements and will return the MyDict element in that position
Assign "whatever" to x['f'][10]['a']
Both PaddedList and DictOrList are bit hacky, but after all the assignments there is no more magic, you have an structure of dicts and lists.
It is possible to synthesize recursively setting items/attributes by overriding __getitem__ to return a return a proxy that can set a value in the original function.
I happen to be working on a library that does a few things similar to this, so I was working on a class that can dynamically assign its own subclasses at instantiation. It makes working with this sort of thing easier, but if that kind of hacking makes you squeamish, you can get similar behavior by creating a ProxyObject similar to the one I create and by creating the individual classes used by the ProxyObject dynamically in the a function. Something like
class ProxyObject(object):
... #see below
def instanciateProxyObjcet(val):
class ProxyClassForVal(ProxyObject,val.__class__):
pass
return ProxyClassForVal(val)
You can use dictionary like I've used in FlexibleObject below would make that implementation significantly more efficient if this is the way you implement it. The code I will providing uses the FlexibleObject though. Right now it only supports classes that, like almost all of Python's builtin classes are capable of being generated by taking an instance of themselves as their sole argument to their __init__/__new__. In the next week or two, I'll add support for anything pickleable, and link to a github repository that contains it. Here's the code:
class FlexibleObject(object):
""" A FlexibleObject is a baseclass for allowing type to be declared
at instantiation rather than in the declaration of the class.
Usage:
class DoubleAppender(FlexibleObject):
def append(self,x):
super(self.__class__,self).append(x)
super(self.__class__,self).append(x)
instance1 = DoubleAppender(list)
instance2 = DoubleAppender(bytearray)
"""
classes = {}
def __new__(cls,supercls,*args,**kws):
if isinstance(supercls,type):
supercls = (supercls,)
else:
supercls = tuple(supercls)
if (cls,supercls) in FlexibleObject.classes:
return FlexibleObject.classes[(cls,supercls)](*args,**kws)
superclsnames = tuple([c.__name__ for c in supercls])
name = '%s%s' % (cls.__name__,superclsnames)
d = dict(cls.__dict__)
d['__class__'] = cls
if cls == FlexibleObject:
d.pop('__new__')
try:
d.pop('__weakref__')
except:
pass
d['__dict__'] = {}
newcls = type(name,supercls,d)
FlexibleObject.classes[(cls,supercls)] = newcls
return newcls(*args,**kws)
Then to use this to use this to synthesize looking up attributes and items of a dictionary-like object you can do something like this:
class ProxyObject(FlexibleObject):
#classmethod
def new(cls,obj,quickrecdict,path,attribute_marker):
self = ProxyObject(obj.__class__,obj)
self.__dict__['reference'] = quickrecdict
self.__dict__['path'] = path
self.__dict__['attr_mark'] = attribute_marker
return self
def __getitem__(self,item):
path = self.__dict__['path'] + [item]
ref = self.__dict__['reference']
return ref[tuple(path)]
def __setitem__(self,item,val):
path = self.__dict__['path'] + [item]
ref = self.__dict__['reference']
ref.dict[tuple(path)] = ProxyObject.new(val,ref,
path,self.__dict__['attr_mark'])
def __getattribute__(self,attr):
if attr == '__dict__':
return object.__getattribute__(self,'__dict__')
path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
ref = self.__dict__['reference']
return ref[tuple(path)]
def __setattr__(self,attr,val):
path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
ref = self.__dict__['reference']
ref.dict[tuple(path)] = ProxyObject.new(val,ref,
path,self.__dict__['attr_mark'])
class UniqueValue(object):
pass
class QuickRecursiveDict(object):
def __init__(self,dictionary={}):
self.dict = dictionary
self.internal_id = UniqueValue()
self.attr_marker = UniqueValue()
def __getitem__(self,item):
if item in self.dict:
val = self.dict[item]
try:
if val.__dict__['path'][0] == self.internal_id:
return val
else:
raise TypeError
except:
return ProxyObject.new(val,self,[self.internal_id,item],
self.attr_marker)
try:
if item[0] == self.internal_id:
return ProxyObject.new(KeyError(),self,list(item),
self.attr_marker)
except TypeError:
pass #Item isn't iterable
return ProxyObject.new(KeyError(),self,[self.internal_id,item],
self.attr_marker)
def __setitem__(self,item,val):
self.dict[item] = val
The particulars of the implementation will vary depending on what you want. It's obviously significantly easier to just override __getitem__ in the proxy than it is to override both __getitem__ and __getattribute__ or __getattr__. The syntax you are using in setbydot makes it look like you would be happiest with some solution that overrides a mixture of the two.
If you are just using the dictionary to compare values, using =,<=,>= etc. Overriding __getattribute__ works really nicely. If you are wanting to do something more sophisticated, you will probably be better off overriding __getattr__ and doing some checks in __setattr__ to determine whether you want to be synthesizing setting the attribute by setting a value in the dictionary or whether you want to be actually setting the attribute on the item you've obtained. Or you might want to handle it so that if your object has an attribute, __getattribute__ returns a proxy to that attribute and __setattr__ always just sets the attribute in the object (in which case, you can completely omit it). All of these things depend on exactly what you are trying to use the dictionary for.
You also may want to create __iter__ and the like. It takes a little bit of effort to make them, but the details should follow from the implementation of __getitem__ and __setitem__.
Finally, I'm going to briefly summarize the behavior of the QuickRecursiveDict in case it's not immediately clear from inspection. The try/excepts are just shorthand for checking to see whether the ifs can be performed. The one major defect of synthesizing the recursive setting rather than find a way to do it is that you can no longer be raising KeyErrors when you try to access a key that hasn't been set. However, you can come pretty close by returning a subclass of KeyError which is what I do in the example. I haven't tested it so I won't add it to the code, but you may want to pass in some human-readable representation of the key to KeyError.
But aside from all that it works rather nicely.
>>> qrd = QuickRecursiveDict
>>> qrd[0][13] # returns an instance of a subclass of KeyError
>>> qrd[0][13] = 9
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] = 'young'
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] # 'young'
>>> qrd[0] # returns an instance of a subclass of KeyError
>>> qrd[0] = 0
>>> qrd[0] # 0
>>> qrd[0][13]['forever'] # 'young'
One more caveat, the things being returned is not quite what it looks like. It's a proxy to what it looks like. If you want the int 9, you need int(qrd[0][13]) not qrd[0][13]. For ints this doesn't matter much since, +,-,= and all that bypass __getattribute__ but for lists, you would lose attributes like append if you didn't recast them. (You'd keep len and other builtin methods, just not attributes of list. You lose __len__.)
So that's it. The code's a little bit convoluted, so let me know if you have any questions. I probably can't answer them until tonight unless the answer's really brief. I wish I saw this question sooner, it's a really cool question, and I'll try to update a cleaner solution soon. I had fun trying to code a solution into the wee hours of last night. :)

Categories

Resources