Find in which of multiple sets a value belongs to - python

I have several sets of values, and need to check in which of some of them a given value is located, and return the name of that set.
value = 'a'
set_1 = {'a', 'b', 'c'}
set_2 = {'d', 'e', 'f'}
set_3 = {'g', 'h', 'i'}
set_4 = {'a', 'e', 'i'}
I'd like to check if value exists in sets 1-3, without including set_4 in the method, and return the set name. So something like:
find_set(value in set_1, set_2, set_3)
should return
set_1
Maybe some neat lambda function? I tried
w = next(n for n,v in filter(lambda t: isinstance(t[1],set), globals().items()) if value in v)
from Find if value exists in multiple lists but that approach checks ALL local/global sets. That won't work here, because the value can exist in several of them. I need to be able to specify in which sets to look.

Don't use an ugly hackish lambda which digs in globals so you can get a name; that will confuse anyone reading your code including yourself after a few weeks :-).
You want to be able to get a name for sets you have defined, well, this is why we have dictionaries. Make a dictionary out of your sets and then you can create handy/readable set/list comprehensions to get what you want in a compact readable fashion:
>>> d = {'set_1': set_1, 'set_2': set_2, 'set_3': set_3, 'set_4': set_4}
To catch all sets in which 'a' is located:
>>> {name for name, items in d.items() if 'a' in items}
{'set_1', 'set_4'}
To exclude some name add another the required clause to the if for filtering:
>>> {name for name, items in d.items() if 'a' in items and name != 'set_4'}
{'set_1'}
You can of course factor this into a function and be happy you'll be able to understand it if you bump into it in the future:
def find_sets(val, *excludes, d=d):
return {n for n, i in d.items() if val in i and n not in excludes}
This behaves in a similar way as the previous. d=d is probably not the way you want to do it, you'll probably be better of using some **d syntax for this.
If you just want to get the first value, return the next(comprehension) from your function like this:
def find_sets(val, *excludes, d=d):
return next((n for n, i in d.items() if val in i and n not in excludes), '')
The '' just indicates a default value to be returned if no elements are actually found, that is, when called with a value that isn't present, an empty string will be returned (subject to change according to your preferences):
>>> find_sets('1')
''

Related

How to check if a dictionary is invertible

I am working on a question that requires to point out the problem in a function that determines whether a dictionary is invertible (for every value appearing in the dictionary, there is only one key that maps to that value) or not. The question is below:
def is_invertible(adict):
inv_dict = make_inv_dict(adict)
return adict == inv_dict
def make_inv_dict(adict):
if len(adict) > 0:
key, val = adict.popitem()
adict = make_inv_dict(adict)
if val not in adict.values():
adict[key] = val
return adict
else:
return {}
Currently, this returns False for {'a': 'b', 'b': 'e', 'c': 'f'} when it is supposed to be True. I am sure that there is an issue in make_inv_dict function; is it simply because adict is not an appropriate variable name in adict = make_inv_dict(adict)? Or is there another reason why the function returns a wrong outcome?
At least three problems with the function you've given:
The condition adict == inv_dict checks whether the dictionary is its own inverse, not merely that it's invertible.
It uses pop_item to remove a key/value pair from the input dictionary, and then inserts it backwards, so the function operates in-place. By the time it's finished, adict's original contents will be completely destroyed, so the comparison will be meaningless anyway.
The line adict[key] = val inserts the key/value pair in the original order; the inverse order should be adict[val] = key. So this function doesn't do what its name promises, which is to make an inverse dictionary.
It should be noted that if not for the destruction of the dictionary (2.), the mistakes (1.) and (3.) would cancel out, because the outcome of the function is to rebuild the original dictionary but without duplicate values.
I'm guessing some people will find this question if they're looking for a correct way to invert a dictionary, so here is one: this function returns the inverse dictionary if it's possible, or None otherwise.
def invert_dict(d):
out = dict()
for k,v in dict.items():
if v in out:
return None
out[v] = k
return out
Helper function returning a boolean for whether a dictionary is invertible:
def is_invertible(d):
return invert_dict(d) is not None
My Answer:
def is_invertible(dict_var):
return len(dict_var.values()) == len(set(dict_var.values()))

How to check a dictionary where values are lists for an element of that list?

If I have a dictionary where each value is a list, how can I check if there is a specific element in my list? For example:
myDict = { 0 : ['a','b','c'],
1 : ['d','e','f']}
How can I check if 'a' exists?
You can use any:
any('a' in lst for lst in myDict.values())
This will stop the iteration and evaluate to True on the first find. any is the built-in short-cut for the following pattern:
for x in y:
if condition:
return True
return False
# return any(condition for x in y)
It always strikes me as strange when someone wants to scan the values of a dictionary. It's highly unefficient if done many times.
Instead, I'd build another dictionary, or a set for quick check:
myDict = { 0 : ['a','b','c'],
1 : ['d','e','f']}
rset = {x for v in myDict.values() for x in v}
print(rset)
gives:
{'b', 'e', 'c', 'd', 'a', 'f'}
now:
'a' in rset
is super fast and concise. Build as many sets & dictionaries as you need on your original data set to get a fast lookup.
Check all values
We can use itertools.chain and use it in a rather self-explaining one liner:
from itertools import chain
if 'a' in chain.from_iterable(myDict.values()):
# do something
pass
Here we will chain the .values() of a list together in an iterable, and thus check membership of 'a'.
Note that this runs in linear time with the total number of values in the lists. In case you have to perform the membership check a single time, we can not do much about it, but in case we have to check it multiple times, it is better to cache the values in a set (given the values are hashable).
Check a specific key
In case you want to check a specific key, we can just lookup the corresponding value and check membership:
if 'a' in myDict[0]:
# do something
pass
In case it is not certain if the key is present in myDict, and we want to return False in that case, we can use .get(..) and use () (the empty tuple) as a fallback value:
# will not error, but False in case key does not exists
if 'a' in myDict.get(0, ()):
# do something
pass

python - recursively deleting dict keys?

I'm using Python 2.7 with plistlib to import a .plist in a nested dict/array form, then look for a particular key and delete it wherever I see it.
When it comes to the actual files we're working with in the office, I already know where to find the values -- but I wrote my script with the idea that I didn't, in the hopes that I wouldn't have to make changes in the future if the file structure changes or we need to do likewise to other similar files.
Unfortunately I seem to be trying to modify a dict while iterating over it, but I'm not certain how that's actually happening, since I'm using iteritems() and enumerate() to get generators and work with those instead of the object I'm actually working with.
def scrub(someobject, badvalue='_default'): ##_default isn't the real variable
"""Walks the structure of a plistlib-created dict and finds all the badvalues and viciously eliminates them.
Can optionally be passed a different key to search for."""
count = 0
try:
iterator = someobject.iteritems()
except AttributeError:
iterator = enumerate(someobject)
for key, value in iterator:
try:
scrub(value)
except:
pass
if key == badvalue:
del someobject[key]
count += 1
return "Removed {count} instances of {badvalue} from {file}.".format(count=count, badvalue=badvalue, file=file)
Unfortunately, when I run this on my test .plist file, I get the following error:
Traceback (most recent call last):
File "formscrub.py", line 45, in <module>
scrub(loadedplist)
File "formscrub.py", line 19, in scrub
for key, value in iterator:
RuntimeError: dictionary changed size during iteration
So the problem might be the recursive call to itself, but even then shouldn't it just be removing from the original object? I'm not sure how to avoid recursion (or if that's the right strategy) but since it's a .plist, I do need to be able to identify when things are dicts or lists and iterate over them in search of either (a) more dicts to search, or (b) the actual key-value pair in the imported .plist that I need to delete.
Ultimately, this is a partial non-issue, in that the files I'll be working with on a regular basis have a known structure. However, I was really hoping to create something that doesn't care about the nesting or order of the object it's working with, as long as it's a Python dict with arrays in it.
Adding or removing items to/from a sequence while iterating over this sequence is tricky at best, and just illegal (as you just discovered) with dicts. The right way to remove entries from a dict while iterating over it is to iterate on a snapshot of the keys. In Python 2.x, dict.keys() provides such a snapshot. So for dicts the solution is:
for key in mydict.keys():
if key == bad_value:
del mydict[key]
As mentionned by cpizza in a comment, for python3, you'll need to explicitely create the snapshot using list():
for key in list(mydict.keys()):
if key == bad_value:
del mydict[key]
For lists, trying to iterate on a snapshot of the indexes (ie for i in len(thelist):) would result in an IndexError as soon as anything is removed (obviously since at least the last index will no more exist), and even if not you might skip one or more items (since the removal of an item makes the sequence of indexes out of sync with the list itself). enumerate is safe against IndexError (since the iteration will stop by itself when there's no more 'next' item in the list, but you'll still skip items:
>>> mylist = list("aabbccddeeffgghhii")
>>> for x, v in enumerate(mylist):
... if v in "bdfh":
... del mylist[x]
>>> print mylist
['a', 'a', 'b', 'c', 'c', 'd', 'e', 'e', 'f', 'g', 'g', 'h', 'i', 'i']
Not a quite a success, as you can see.
The known solution here is to iterate on reversed indexes, ie:
>>> mylist = list("aabbccddeeffgghhii")
>>> for x in reversed(range(len(mylist))):
... if mylist[x] in "bdfh":
... del mylist[x]
>>> print mylist
['a', 'a', 'c', 'c', 'e', 'e', 'g', 'g', 'i', 'i']
This works with reversed enumeration too, but we dont really care.
So to summarize: you need two different code path for dicts and lists - and you also need to take care of "not container" values (values which are neither lists nor dicts), something you do not take care of in your current code.
def scrub(obj, bad_key="_this_is_bad"):
if isinstance(obj, dict):
# the call to `list` is useless for py2 but makes
# the code py2/py3 compatible
for key in list(obj.keys()):
if key == bad_key:
del obj[key]
else:
scrub(obj[key], bad_key)
elif isinstance(obj, list):
for i in reversed(range(len(obj))):
if obj[i] == bad_key:
del obj[i]
else:
scrub(obj[i], bad_key)
else:
# neither a dict nor a list, do nothing
pass
As a side note: never write a bare except clause. Never ever. This should be illegal syntax, really.
Here a generalized version of the one of #bruno desthuilliers, with a callable to test against the keys.
def clean_dict(obj, func):
"""
This method scrolls the entire 'obj' to delete every key for which the 'callable' returns
True
:param obj: a dictionary or a list of dictionaries to clean
:param func: a callable that takes a key in argument and return True for each key to delete
"""
if isinstance(obj, dict):
# the call to `list` is useless for py2 but makes
# the code py2/py3 compatible
for key in list(obj.keys()):
if func(key):
del obj[key]
else:
clean_dict(obj[key], func)
elif isinstance(obj, list):
for i in reversed(range(len(obj))):
if func(obj[i]):
del obj[i]
else:
clean_dict(obj[i], func)
else:
# neither a dict nor a list, do nothing
pass
And an example with a regex callable :
func = lambda key: re.match(r"^<div>", key)
clean_dict(obj, func)
def walk(d, badvalue, answer=None, sofar=None):
if sofar is None:
sofar = []
if answer is None:
answer = []
for k,v in d.iteritems():
if k == badvalue:
answer.append(sofar + [k])
if isinstance(v, dict):
walk(v, badvalue, answer, sofar+[k])
return answer
def delKeys(d, badvalue):
for path in walk(d, badvalue):
dd = d
while len(path) > 1:
dd = dd[path[0]]
path.pop(0)
dd.pop(path[0])
Output
In [30]: d = {1:{2:3}, 2:{3:4}, 5:{6:{2:3}, 7:{1:2, 2:3}}, 3:4}
In [31]: delKeys(d, 2)
In [32]: d
Out[32]: {1: {}, 3: 4, 5: {6: {}, 7: {1: 2}}}

Determine position of values in dictionary, Python

There is a dictionary that may include keys starting from 0 and values: a, b, c, d, e. Each time the values may be assigned to different keys keys. Size of the dictionary may change as well.
I am interested in two values. Let's call them b and d.
Is there any algorithm that determine situations when b appears earlier than d (i.e. b's key is smaller than d's) and when d appears earlier than b (i.e. d's key is is smaller than b's)?
A dictionary has no order. So your wording "b's key is smaller than d's" is the right one.
Now, it looks like you could swap keys and values...
If the values are hashable then you could generate a reverse dictionary and check the values. Otherwise, you'll need to brute-force it.
def dictfind(din, tsent, fsent):
for k in sorted(din.iterkeys()):
if din[k] == tsent:
return True
if din[k] == fsent:
return False
else:
raise ValueError('No match found')
D = {0:'a', 1:'b', 2:'c', 3:'d', 4:'e'}
print dictfind(D, 'b', 'd')
Dictionaries are unordered sets of key-value pairs. dict.keys() need not produce the same output always. Can't you do what you want with lists?
First create your dictionary
>>> import random
>>> keys = range(5)
>>> random.shuffle(keys)
>>> d=dict(zip(keys, "abcde"))
>>> d
{0: 'd', 1: 'c', 2: 'e', 3: 'b', 4: 'a'}
Now create a dictionary using the keys of d as the values and the values of d as the keys
>>> rev_d = dict((v,k) for k,v in d.items())
Your comparisons are now just regular dictionary lookups
>>> rev_d['b'] > rev_d['d']
True
From your comment on gnibbler's answer, it sounds like when there are multiple occurrences of a value, you only care about the earliest appearing one. In that case, the swapped (value, key)-dictionary suggested can still be used, but with minor modification to how you build it.
xs = {0: 'a', 1: 'b', 2: 'a'}
ys = {}
for k, v in xs.iteritems():
if v not in ys or k < ys[v]:
ys[v] = k
You could then define a function that tells you which of two values maps to a smaller index:
def earlier(index_map, a, b):
"""Returns `a` or `b` depending on which has a smaller value in `index_map`.
Returns `None` if either `a` or `b` is not in `index_map`.
"""
if a not in index_map or b not in index_map:
return None
if index_map[a] < index_map[b]:
return a
return b
Usage:
print earlier(ys, 'a', 'b')
There are some subtleties here whose resolution depends on your particular problem.
What should happen if a or b is not in index_map? Right now we return None.
What should happen if index_map[a] == index_map[b]? From your comments it sounds like this may not happen in your case, but you should consider it. Right now we return b.

Efficient way to either create a list, or append to it if one already exists?

I'm going through a whole bunch of tuples with a many-to-many correlation, and I want to make a dictionary where each b of (a,b) has a list of all the a's that correspond to a b. It seems awkward to test for a list at key b in the dictionary, then look for an a, then append a if it's not already there, every single time through the tuple digesting loop; but I haven't found a better way yet. Does one exist? Is there some other way to do this that's a lot prettier?
See the docs for the setdefault() method:
setdefault(key[, default])
If key is
in the dictionary, return its value.
If not, insert key with a value of
default and return default. default
defaults to None.
You can use this as a single call that will get b if it exists, or set b to an empty list if it doesn't already exist - and either way, return b:
>>> key = 'b'
>>> val = 'a'
>>> print d
{}
>>> d.setdefault(key, []).append(val)
>>> print d
{'b': ['a']}
>>> d.setdefault(key, []).append('zee')
>>> print d
{'b': ['a', 'zee']}
Combine this with a simple "not in" check and you've done what you're after in three lines:
>>> b = d.setdefault('b', [])
>>> if val not in b:
... b.append(val)
...
>>> print d
{'b': ['a', 'zee', 'c']}
Assuming you're not really tied to lists, defaultdict and set are quite handy.
import collections
d = collections.defaultdict(set)
for a, b in mappings:
d[b].add(a)
If you really want lists instead of sets, you could follow this with a
for k, v in d.iteritems():
d[k] = list(v)
And if you really want a dict instead of a defaultdict, you can say
d = dict(d)
I don't really see any reason you'd want to, though.
Use collections.defaultdict
your_dict = defaultdict(list)
for (a,b) in your_list:
your_dict[b].append(a)
you can sort your tuples O(n log n) then create your dictionary O(n)
or simplier O(n) but could impose heavy load on memory in case of many tuples:
your_dict = {}
for (a,b) in your_list:
if b in your_dict:
your_dict[b].append(a)
else:
your_dict[b]=[a]
Hmm it's pretty much the same as you've described. What's awkward about that?
You could also consider using an sql database to do the dirty work.
Instead of using an if, AFAIK it is more pythonic to use a try block instead.
your_list=[('a',1),('a',3),('b',1),('f',1),('a',2),('z',1)]
your_dict={}
for (a,b) in your_list:
try:
your_dict[b].append(a)
except KeyError:
your_dict[b]=[a]
print your_dict
I am not sure how you will get out of the key test, but once they key/value pair has been initialized it is easy :)
d = {}
if 'b' not in d:
d['b'] = set()
d['b'].add('a')
The set will ensure that only 1 of 'a' is in the collection. You need to do the initial 'b' check though to make sure the key/value exist.
Dict get method?
It returns the value of my_dict[some_key] if some_key is in the dictionary, and if not - returns some default value ([] in the example below):
my_dict[some_key] = my_dict.get(some_key, []).append(something_else)
There's another way that's rather efficient (though maybe not as efficient as sets) and simple. It's similar in practice to defaultdict but does not require an additional import.
Granted that you have a dict with empty (None) keys, it means you also create the dict keys somewhere. You can do so with the dict.fromkeys method, and this method also allows for setting a default value to all keys.
keylist = ['key1', 'key2']
result = dict.fromkeys(keylist, [])
where result will be:
{'key1': [], 'key2': []}
Then you can do your loop and use result['key1'].append(..) directly

Categories

Resources