python - recursively deleting dict keys?

python - recursively deleting dict keys? - python

I'm using Python 2.7 with plistlib to import a .plist in a nested dict/array form, then look for a particular key and delete it wherever I see it.
When it comes to the actual files we're working with in the office, I already know where to find the values -- but I wrote my script with the idea that I didn't, in the hopes that I wouldn't have to make changes in the future if the file structure changes or we need to do likewise to other similar files.
Unfortunately I seem to be trying to modify a dict while iterating over it, but I'm not certain how that's actually happening, since I'm using iteritems() and enumerate() to get generators and work with those instead of the object I'm actually working with.
def scrub(someobject, badvalue='_default'): ##_default isn't the real variable
"""Walks the structure of a plistlib-created dict and finds all the badvalues and viciously eliminates them.
Can optionally be passed a different key to search for."""
count = 0
try:
iterator = someobject.iteritems()
except AttributeError:
iterator = enumerate(someobject)
for key, value in iterator:
try:
scrub(value)
except:
pass
if key == badvalue:
del someobject[key]
count += 1
return "Removed {count} instances of {badvalue} from {file}.".format(count=count, badvalue=badvalue, file=file)
Unfortunately, when I run this on my test .plist file, I get the following error:
Traceback (most recent call last):
File "formscrub.py", line 45, in <module>
scrub(loadedplist)
File "formscrub.py", line 19, in scrub
for key, value in iterator:
RuntimeError: dictionary changed size during iteration
So the problem might be the recursive call to itself, but even then shouldn't it just be removing from the original object? I'm not sure how to avoid recursion (or if that's the right strategy) but since it's a .plist, I do need to be able to identify when things are dicts or lists and iterate over them in search of either (a) more dicts to search, or (b) the actual key-value pair in the imported .plist that I need to delete.
Ultimately, this is a partial non-issue, in that the files I'll be working with on a regular basis have a known structure. However, I was really hoping to create something that doesn't care about the nesting or order of the object it's working with, as long as it's a Python dict with arrays in it.

Adding or removing items to/from a sequence while iterating over this sequence is tricky at best, and just illegal (as you just discovered) with dicts. The right way to remove entries from a dict while iterating over it is to iterate on a snapshot of the keys. In Python 2.x, dict.keys() provides such a snapshot. So for dicts the solution is:
for key in mydict.keys():
if key == bad_value:
del mydict[key]
As mentionned by cpizza in a comment, for python3, you'll need to explicitely create the snapshot using list():
for key in list(mydict.keys()):
if key == bad_value:
del mydict[key]
For lists, trying to iterate on a snapshot of the indexes (ie for i in len(thelist):) would result in an IndexError as soon as anything is removed (obviously since at least the last index will no more exist), and even if not you might skip one or more items (since the removal of an item makes the sequence of indexes out of sync with the list itself). enumerate is safe against IndexError (since the iteration will stop by itself when there's no more 'next' item in the list, but you'll still skip items:
>>> mylist = list("aabbccddeeffgghhii")
>>> for x, v in enumerate(mylist):
... if v in "bdfh":
... del mylist[x]
>>> print mylist
['a', 'a', 'b', 'c', 'c', 'd', 'e', 'e', 'f', 'g', 'g', 'h', 'i', 'i']
Not a quite a success, as you can see.
The known solution here is to iterate on reversed indexes, ie:
>>> mylist = list("aabbccddeeffgghhii")
>>> for x in reversed(range(len(mylist))):
... if mylist[x] in "bdfh":
... del mylist[x]
>>> print mylist
['a', 'a', 'c', 'c', 'e', 'e', 'g', 'g', 'i', 'i']
This works with reversed enumeration too, but we dont really care.
So to summarize: you need two different code path for dicts and lists - and you also need to take care of "not container" values (values which are neither lists nor dicts), something you do not take care of in your current code.
def scrub(obj, bad_key="_this_is_bad"):
if isinstance(obj, dict):
# the call to `list` is useless for py2 but makes
# the code py2/py3 compatible
for key in list(obj.keys()):
if key == bad_key:
del obj[key]
else:
scrub(obj[key], bad_key)
elif isinstance(obj, list):
for i in reversed(range(len(obj))):
if obj[i] == bad_key:
del obj[i]
else:
scrub(obj[i], bad_key)
else:
# neither a dict nor a list, do nothing
pass
As a side note: never write a bare except clause. Never ever. This should be illegal syntax, really.

Here a generalized version of the one of #bruno desthuilliers, with a callable to test against the keys.
def clean_dict(obj, func):
"""
This method scrolls the entire 'obj' to delete every key for which the 'callable' returns
True
:param obj: a dictionary or a list of dictionaries to clean
:param func: a callable that takes a key in argument and return True for each key to delete
"""
if isinstance(obj, dict):
# the call to `list` is useless for py2 but makes
# the code py2/py3 compatible
for key in list(obj.keys()):
if func(key):
del obj[key]
else:
clean_dict(obj[key], func)
elif isinstance(obj, list):
for i in reversed(range(len(obj))):
if func(obj[i]):
del obj[i]
else:
clean_dict(obj[i], func)
else:
# neither a dict nor a list, do nothing
pass
And an example with a regex callable :
func = lambda key: re.match(r"^<div>", key)
clean_dict(obj, func)

def walk(d, badvalue, answer=None, sofar=None):
if sofar is None:
sofar = []
if answer is None:
answer = []
for k,v in d.iteritems():
if k == badvalue:
answer.append(sofar + [k])
if isinstance(v, dict):
walk(v, badvalue, answer, sofar+[k])
return answer
def delKeys(d, badvalue):
for path in walk(d, badvalue):
dd = d
while len(path) > 1:
dd = dd[path[0]]
path.pop(0)
dd.pop(path[0])
Output
In [30]: d = {1:{2:3}, 2:{3:4}, 5:{6:{2:3}, 7:{1:2, 2:3}}, 3:4}
In [31]: delKeys(d, 2)
In [32]: d
Out[32]: {1: {}, 3: 4, 5: {6: {}, 7: {1: 2}}}

Related

How to check if a dictionary is invertible

I am working on a question that requires to point out the problem in a function that determines whether a dictionary is invertible (for every value appearing in the dictionary, there is only one key that maps to that value) or not. The question is below:
def is_invertible(adict):
inv_dict = make_inv_dict(adict)
return adict == inv_dict
def make_inv_dict(adict):
if len(adict) > 0:
key, val = adict.popitem()
adict = make_inv_dict(adict)
if val not in adict.values():
adict[key] = val
return adict
else:
return {}
Currently, this returns False for {'a': 'b', 'b': 'e', 'c': 'f'} when it is supposed to be True. I am sure that there is an issue in make_inv_dict function; is it simply because adict is not an appropriate variable name in adict = make_inv_dict(adict)? Or is there another reason why the function returns a wrong outcome?

At least three problems with the function you've given:
The condition adict == inv_dict checks whether the dictionary is its own inverse, not merely that it's invertible.
It uses pop_item to remove a key/value pair from the input dictionary, and then inserts it backwards, so the function operates in-place. By the time it's finished, adict's original contents will be completely destroyed, so the comparison will be meaningless anyway.
The line adict[key] = val inserts the key/value pair in the original order; the inverse order should be adict[val] = key. So this function doesn't do what its name promises, which is to make an inverse dictionary.
It should be noted that if not for the destruction of the dictionary (2.), the mistakes (1.) and (3.) would cancel out, because the outcome of the function is to rebuild the original dictionary but without duplicate values.
I'm guessing some people will find this question if they're looking for a correct way to invert a dictionary, so here is one: this function returns the inverse dictionary if it's possible, or None otherwise.
def invert_dict(d):
out = dict()
for k,v in dict.items():
if v in out:
return None
out[v] = k
return out
Helper function returning a boolean for whether a dictionary is invertible:
def is_invertible(d):
return invert_dict(d) is not None

My Answer:
def is_invertible(dict_var):
return len(dict_var.values()) == len(set(dict_var.values()))

Find in which of multiple sets a value belongs to

I have several sets of values, and need to check in which of some of them a given value is located, and return the name of that set.
value = 'a'
set_1 = {'a', 'b', 'c'}
set_2 = {'d', 'e', 'f'}
set_3 = {'g', 'h', 'i'}
set_4 = {'a', 'e', 'i'}
I'd like to check if value exists in sets 1-3, without including set_4 in the method, and return the set name. So something like:
find_set(value in set_1, set_2, set_3)
should return
set_1
Maybe some neat lambda function? I tried
w = next(n for n,v in filter(lambda t: isinstance(t[1],set), globals().items()) if value in v)
from Find if value exists in multiple lists but that approach checks ALL local/global sets. That won't work here, because the value can exist in several of them. I need to be able to specify in which sets to look.

Don't use an ugly hackish lambda which digs in globals so you can get a name; that will confuse anyone reading your code including yourself after a few weeks :-).
You want to be able to get a name for sets you have defined, well, this is why we have dictionaries. Make a dictionary out of your sets and then you can create handy/readable set/list comprehensions to get what you want in a compact readable fashion:
>>> d = {'set_1': set_1, 'set_2': set_2, 'set_3': set_3, 'set_4': set_4}
To catch all sets in which 'a' is located:
>>> {name for name, items in d.items() if 'a' in items}
{'set_1', 'set_4'}
To exclude some name add another the required clause to the if for filtering:
>>> {name for name, items in d.items() if 'a' in items and name != 'set_4'}
{'set_1'}
You can of course factor this into a function and be happy you'll be able to understand it if you bump into it in the future:
def find_sets(val, *excludes, d=d):
return {n for n, i in d.items() if val in i and n not in excludes}
This behaves in a similar way as the previous. d=d is probably not the way you want to do it, you'll probably be better of using some **d syntax for this.
If you just want to get the first value, return the next(comprehension) from your function like this:
def find_sets(val, *excludes, d=d):
return next((n for n, i in d.items() if val in i and n not in excludes), '')
The '' just indicates a default value to be returned if no elements are actually found, that is, when called with a value that isn't present, an empty string will be returned (subject to change according to your preferences):
>>> find_sets('1')
''

For loop inside while loop not working in Python

I am facing very unusual problem, below is code inside a class where pitnamebasename is 2d list.
For example:=
pitnamebasename= [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
Above list is not necessary to be in any order like ['d',''] can be at 0th order.
Here is my function (inside a class):-
def getRole(self,pitname):
basename=pitname
print '\n\n\nbasename=',basename
ind=True
while pitname and ind:
ind=False
basename=pitname
print 'basename=',basename
for i in self.pitnamebasename:
print 'comparing-',i[0],'and',pitname
if i[0] == pitname:
pitname=i[1]
print 'newpitname=',pitname
ind=True
break
print 'returning-',basename
return basename
pitname is the string for example here it can be 'a'. I want return value to be 'd' mean the traversing must be like a to b, b to c and d to None, hence return value must be d.
Please don't suggest me any other methods to solve.
Now my problem is that in the for loop its not looping till last but getting out in middle. Like return value is either b or c or even d depends on what I am searching. Actually list is very very long. Strange thing I noted that for loop loops only to that index where it loops till its first time. Like here first time for loop gets end when it find 'a' and pitname becomes 'b' but when it search for 'b' it loops till it find 'a' only. Does anyone knows how it is happening?

pitnamebasename= [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
First, change your '2d' array into a dict:
pitnamebasename = dict(pitnamebasename)
Now, it should be a simple matter of walking from element to element, using the value associated with the current key as the next key, until the value is the empty string; then
return the current key. If pitname ever fails to exist as a key, it's treated as if it does exist and maps to the empty string.
def getRole(self, pitname):
while pitnamebasename.get('pitname', '') != '':
pitname = pitnamebasename[pitname]
return pitname
A defaultdict could also be used in this case:
import collections.defaultdict
pitnamebasename = defaultdict(str, pitnamebasename)
def getRole(self, pitname):
while pitnamebasename[pitname] != '':
pitname = pitnamebasename[pitname]
return pitname

You asked for the solution to your problem but I am having trouble replicating the problem. This code does the same thing without requiring you to change your entire class storage system.
By converting your list of lists into a dictionary lookup (looks like the following)
as_list_of_lists = [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
as_dict = dict(as_list_of_lists)
# as_dict = {'a': 'b', 'c': 'd', 'b': 'c', 'd': '', 'h': 'f', 'm': 'f', 'n': 'm'}
we can do a default dictionary lookup using the dictionary method .get. This will look for an item (say we pass it 'a') and return the associated value ('b'). If we look for something that isn't in the dictionary, .get will return the second value (a default value) which we can supply as ''. Hence as_dict.get('z','') will return ''
class this_class
def __init__(self):
self.pitnamebasename= [['a','b'],['n','m'],['b','c'],['m','f'],['c','d'],['d',''],['h','f']]
def getRole(self,pitname):
lookup = dict(self.pitnamebasename)
new_pitname = pitname
while new_pitname != '':
pitname = new_pitname
new_pitname = lookup.get(pitname, '')
return pitname

Search value in tree of dicts in python

I have a giant dict with a lot of nested dicts -- like a giant tree, and depth in unknown.
I need a function, something like find_value(), that takes dict, value (as string), and returns list of lists, each one of them is "path" (sequential chain of keys from first key to key (or key value) with found value). If nothing found, returns empty list.
I wrote this code:
def find_value(dict, sought_value, current_path, result):
for key,value in dict.items():
current_path.pop()
current_path.append(key)
if sought_value in key:
result.append(current_path)
if type(value) == type(''):
if sought_value in value:
result.append(current_path+[value])
else:
current_path.append(key)
result = find_value(value, sought_value, current_path, result)
current_path.pop()
return result
I call this function to test:
result = find_value(self.dump, sought_value, ['START_KEY_FOR_DELETE'], [])
if not len(result):
print "forgive me, mylord, i'm afraid we didn't find him.."
elif len(result) == 1:
print "bless gods, for all that we have one match, mylord!"
For some inexplicable reasons, my implementation of this function fails some of my tests. I started to debug and find out, that even if current_path prints correct things (it always does, I checked!), the result is inexplicably corrupted. Maybe it is because of recursion magic?
Can anyone help me with this problem? Maybe there is a simple solution for my tasks?

When you write result.append(current_path), you're not copying current_path, which continues to mutate. Change it to result.append(current_path[:]).

I doubt you can do much to optimize a recursive search like that. Assuming there are many lookups on the same dictionary, and the dictionary doesn't change once loaded, then you can index it to get O(1) lookups...
def build_index(src, dest, path=[]):
for k, v in src.iteritems():
fk = path+[k]
if isinstance(v, dict):
build_index(v, dest, fk)
else:
try:
dest[v].append(fk)
except KeyError:
dest[v] = [fk]
>>> data = {'foo': {'sub1': 'blah'}, 'bar': {'sub2': 'whatever'}, 'baz': 'blah'}
>>> index = {}
>>> build_index(data, index)
>>> index
{'blah': [['baz'], ['foo', 'sub1']], 'whatever': [['bar', 'sub2']]}
>>> index['blah']
[['baz'], ['foo', 'sub1']]

python how to build a new list from this one

I have a list in the following format:
['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d',
'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
I want to create a new list which looks like like this:
['CASE_1:a,b,c,d','CASE_2:e,f,g,h']
Any idea how to get this done elegantly??

You can use a defaultdict by treating case as the key, and appending to the list each letter, where case and the letter are obtained by splitting the elements of your list on ':' - such as:
from collections import defaultdict
case_letters = defaultdict(list)
start = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
for el in start:
case, letter = el.split(':')
case_letters[case].append(letter)
result = sorted('{case}:{letters}'.format(case=key, letters=','.join(values)) for key, values in case_letters.iteritems())
print result
As this is homework (edit: or was!!?) - I recommend looking at collections.defaultdict, str.split (and other builtin string methods), at the builtin type list and it's methods (such as append, extend, sort etc...), str.format, the builtin sorted method and generally a dict in general. Use the working example here along with the final manual for reference - all these things will come in handy later on - so it's in your best interest to understand them as best you can.
One other thing to consider is that having something like:
{1: ['a', 'b', 'c', 'd'], 2: ['e', 'f', 'g', 'h']}
is a lot more of a useful format and could be used to recreate your desired list afterwards anyway...

I've deleted my full solution since I realized this is homework, but here's the basic idea:
A dictionary is a better data structure. I would look at a collections.defaultdict. e.g.
yourdict = defaultdict(list)
You can iterate through your list (splitting each element on ':'). Something like:
#only split string once -- resulting in a list of length 2.
case, value = element.split(':',1)
Then you can add these to the dict using the list .append method:
yourdict[case].append(value)
Now, you'll have a dict which maps keys (Case_1, Case_2) to lists (['a','b','c','d'], [...]).
If you really need a list, you can sort the items of the dictionary and join appropriately.
sigh. It looks like the homework tag has been removed (here's my original solution):
from collections import defaultdict
d = defaultdict(list)
for elem in yourlist:
case, value = elem.split(':', 1)
d[case].append(value)
Now you have a dictionary as I described above. If you really want to get your list back:
new_lst = [ case+':'+','.join(values) for case,values in sorted(d.items()) ]

data = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
output = {}
for item in data:
key, value = item.split(':')
if key not in output:
output[key] = []
output[key].append(value)
result = []
for key, values in output.items():
result.append('%s:%s' % (key, ",".join(values)))
print result
outputs
['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']

mydict = {}
for item in list:
key,value = item.split(":")
if key in mydict:
mydict[key].append(value)
else:
mydict[key] = [value]
[key + ":" + ",".join(value) for key, value in mydict.iteritems()]
Not much elegance, to be honest. You know, I'd store your list as a dict, cause it behaves as a dict in fact.
output is ['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python - recursively deleting dict keys? - python

Related

How to check if a dictionary is invertible

Find in which of multiple sets a value belongs to

For loop inside while loop not working in Python

Search value in tree of dicts in python

python how to build a new list from this one

Categories

Resources