Using suffixes for dictionary searching - python

P/S: Duplicates questions raised so far are concerning on prefixes (thanks for that anyway)
This question is on suffixes.
With dictionary
dic={"abcd":2, "bbcd":2, "abgg":2}
Is it possible to search the dictionary using suffix of the string, i.e., if given "bcd", it will return me two entries
{"abcd":2, "bbcd":2}
One possible way:
dic1={}
for k, v in dic.items():
if(k.endswith("bcd")):
dic1[k]=v
Is it possible to do it more efficiently?

for a small problems set you can do it with a simple list comprehension:
suffixed = [v for k, v in dic.items() if k.endswith("bcd")]
however that means doing a substring check on every item in the dictionary every time you query. If that's slow on big data sets you can make a second dictionary of the original keys as an acceleration. You'd have to do a one time pre-pass:
suffixes = dict ( [ (k[-3:], []) for k in dic1] )
for k in dic1:
suffixes[k[-3:]].append(dic1[k])
That would give you all the results for each suffix. You could store the keys instead of the values the same way and then chain to a lookup.
In any event, the hashed lookups for dictionary keys are very cheap, so it's best to cache your data in a dictionary with the keys you want (ie, suffixes) rather than looping over every key doing strings.

Related

Python: Obtaining index of an element within a value list of a dictionary

I have a dictionary with key:value list pairings, and I intend to find the index of the value list that contains the desired element.
E.g., if the dictionary is:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
Then, given element 'v2' I should be able to get index 2.
For a value list with one element, the index can be obtained with: list(my_dict.values()).index(['v1']) , however this approach does not work with lists containing multiple elements.
Using for loop, it can be obtained via:
for key, value in my_dict.items():
if value is None:
continue
if 'v2' in value:
print (list(my_dict.keys()).index(key))
Is there a neater (pythonic) way to obtain the same?
You've got an XY problem. You want to know the key that points to a value, and you think you need to find the enumeration index iterating the values so you can then use it to find the key by iteration as well. You don't need all that. Just find the key directly:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
value = 'v2'
# Iterate key/vals pairs in genexpr; if the vals contains value, yield the key,
# next stops immediately for the first key yielded so you don't iterate the whole dict
# when the value is found on an early key
key_for_value = next(key for key, vals in my_dict.items() if vals and value in vals)
print(key_for_value)
Try it online!
That'll raise StopIteration if the value doesn't exist, otherwise it directly retrieves the first key where the values list for that key contains the desired value.
If you don't really have an XY problem, and the index is important (it shouldn't be, that's a misuse of dicts) it's trivial to produce it as well, changing the extraction of the key to get both, e.g.:
index, key_for_value = next((i, key) for i, (key, vals) in enumerate(my_dict.items()) if vals and value in vals)
Mind you, this is a terrible solution if you need to perform these lookups a lot and my_dict isn't trivially small; it's O(n) on the total number of values, so a large dict would take quite a while to check (relative to the cost of just looking up an arbitrary key, which is average-case O(1)). In that case, ideally, if my_dict doesn't change much/at all, you'd construct a reversed dictionary up-front to find the key(s) associated with a value, e.g.:
from collections import defaultdict
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
reversed_my_dict = defaultdict(set)
for key, vals in my_dict:
for val in vals:
reversed_my_dict[val].add(key)
reversed_my_dict = dict(reversed_my_dict) # Optional: Prevents future autovivification of keys
# by converting back to plain dict
after which you can cheaply determine the key(s) associated with a given value with:
reversed_my_dict.get(value, ()) # Using .get prevents autovivification of keys even if still a defaultdict
which returns the set of all keys that map to that value, if any, or the empty tuple if not (if you convert back to dict above, reversed_my_dict[value] would also work if you'd prefer to get a KeyError when the value is missing entirely; leaving it a defaultdict(set) would silently construct a new empty set, map it to the key and return it, which is fine if this happens rarely, but a problem if you test thousands of unmapped values and create a corresponding thousands of empty sets for no benefit, consuming memory wastefully).
Which you choose depends on how big my_dict is (for small my_dict, O(n) work doesn't really matter that much), how many times you need to search it (fewer searches mean less gain from reversed dict), and whether it's regularly modified. For that last point, if it's never modified, or rarely modified between lookups, rebuilding the reversed dict from scratch after each modification might be worth it for simplicity (assuming you perform many lookups per rebuild); if it's frequently modified, the reversed dict might still be worth it, you'd just have to update both the forward and reversed dicts rather than just one, e.g., expanding:
# New key
my_dict[newkey] = [newval1, newval2]
# Add value
my_dict[existingkey].append(newval)
# Delete value
my_dict[existingkey].remove(badval)
# Delete key
del my_dict[existingkey]
to:
# New key
newvals = my_dict[newkey] = [newval1, newval2]
for newval in newvals:
reversed_my_dict[newval].add(newkey) # reversed_my_dict.setdefault(newval, set()).add(newkey) if not defaultdict(set) anymore
# Add value
my_dict[existingkey].append(newval)
reversed_my_dict[newval].add(existingkey) # reversed_my_dict.setdefault(newval, set()).add(existingkey) if not defaultdict(set) anymore
# Delete value
my_dict[existingkey].remove(badval)
if badval not in my_dict[existingkey]: # Removed last copy; test only needed if one key can hold same value more than once
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
# Delete key
# set() conversion not needed if my_dict's value lists guaranteed not to contain duplicates
for badval in set(my_dict.pop(existingkey)):
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
respectively, roughly doubling the work incurred by modifications, in exchange for always getting O(1) lookups in either direction.
If you are looking for the key corresponding to a value, you can reverse the dictionary like so:
reverse_dict = {e: k for k, v in my_dict.items() if v for e in v}
Careful with duplicate values though. The last occurence will override the previous ones.
Don't know if it's the best solution but this works:
value = 'v2'
list(map(lambda x : value in x, list(map(lambda x : x[1] or [], list(my_dict.items()))))).index(True)

Comparing multiple values within a dictionary and returning key

I have a dictionary where the keys are an arbitrary name and the values are an mtime of a file. Example:
{'server_1': 1506286408.854673, 'server_2': 1506286219.1254442, 'server_3':1506472359.154043}
I wish to iterate over comparing two of the values from the dictionary finding the largest of the two, and returning the key of said large value and continuing to do this until there is only a single key:val pair left.
I know there is a way of "ordering" dictionaries by value with some tricks provided by outside libraries like operator and defaultdict. However, I was curious if there was an easier way to accomplish this goal and avoid trying to sort a naturally unordered structure.
So the end result I would be looking for is the first iteration to return server_3, then server_1 and then stop there.
It looks like you want to sort dictionary based on values but ignore the last one.
def keys_sorted_by_values(d):
return [ k for k, v in sorted(d.items(), key=lambda item: item[1], reverse=True) ][:-1]
server_to_mtime = {'server_1': 1506286408.854673, 'server_2': 1506286219.1254442, 'server_3':1506472359.154043}
for server in keys_sorted_by_values(server_to_mtime):
print(server)
Output
server_3
server_1

looking for patterns in a dictionary and make a new dictionary

I have a list of all combinations of sequences can be made with 'K'
and 'M' and the lengths are from 6 to 18. so, I have combinations
including "KKKKKK" to "MMMMMMMMMMMMMMMMMM".
I have also a dictionary in which the keys are ids and the values are
long sequences made not only with K and M but also with some more
characters which are not important for me.
small example:
com = ["KKKKKK", "KKKKKM", ......, "MMMMMMMMMMMMMMMMMM"]
li = {id1: "KKKKKKHKJASGKKKMOOGBMMMMMMMMMMMMMMMMMM",
id2:"MMKFJDFKFGKJJJJFKKKKKMJKJHFKKKKKK"}
I want to find different combinations in the li dictionary(values) and
make a new dictionary in which the keys are ids from li dictionary
(the keys) and values are a list containing the combinations found in
the values of li dictionary. for the small example the output would be
like this:
results = {id1: ["KKKKKK", "MMMMMMMMMMMMMMMMMM"], id2: ["KKKKKM", "KKKKKK"] }
I wrote the following code but did not give me what I want.
results = {}
for i in com:
if i in li.values():
results[li.keys()] = [i]
You can use re.findall() within a dictionary comprehension:
In [11]: {k: re.findall(r'(?:K|M){6,18}', v) for k, v in li.items()}
Out[11]: {'id1': ['KKKKKK', 'MMMMMMMMMMMMMMM'], 'id2': ['KKKKKM', 'KKKKKK']}
r'(?:K|M){6,18}' is a regular expression that will match any substring of K or M with length 6 to 18.
The problem is here: if i in li.values():. This line will check if any of the dictionary's values equals the current combination. Instead, you want this:
if v in li.values():
if i in v:
Which will check if any of the dict's values contains the current combination.
Also, this line results[li.keys()] = [i] will map all of the dict's keys to a new list. There are two problems with that: first, you want to map only the relevant key. Second, you want to add to the current list, not replace it with a new one.

What's the fastest way to identify the 'name' of a dictionary that contains a specific key-value pair?

I'd like to identify the dictionary within the following list that contains the key-value pair 'Keya':'123a', which is ID1 in this case.
lst = {ID1:{'Keya':'123a','Keyb':456,'Keyc':789},ID2:{'Keya':'132a','Keyb':654,'Keyc':987},ID3:{'Keya':'5433a','Keyb':222,'Keyc':333},ID4:{'Keya':'444a','Keyb':777,'Keyc':666}}
It's safe to assume all dictionaries have the same key's, but have different values.
I currently have the following to identify which dictionary has the value '123a' for the key 'Keya', but is there a shorter and faster way?
DictionaryNames = map(lambda Dict: str(Dict),lst)
Dictionaries = [i[1] for i in lst.items()]
Dictionaries = map(lambda Dict: str(Dict),Dictionaries)
Dict = filter(lambda item:'123a' in item,Dictionaries)
val = DictionaryNames[Dictionaries.index(Dict[0])]
return val
If you actually had a list of dictionaries, this would be:
next(d for d in list_o_dicts if d[key]==value)
Since you actually have a dictionary of dictionaries, and you want the key associated with the dictionary, it's:
next(k for k, d in dict_o_dicts.items() if d[key]==value)
This returns the first matching value. If you're absolutely sure there is exactly one, or if you don't care which you get if there are more than one, and if you're happy with a StopIteration exception if you were wrong and there isn't one, that's exactly what you want.
If you need all matching values, just do the same with a list comprehension:
[k for k, d in dict_o_dicts.items() if d[key]==value]
That list can of course have 0, 1, or 17 values.
You can just do [name for name, d in lst.iteritems() if d['Keya']=='123a'] to get a list of all the dictionaries in lst that have that value for that key. If you know there is only one, you can get it with [name for name, d in lst.iteritems() if d['Keya']=='123a'][0]. (As Andy mentions in a comment, your name lst is misleading, since lst is actually a dictionary of dictionaries, not a list.)
Since you want the fastest, you should short-cut your search as soon as you find the data you are after. Iterating through the whole list is not necessary, nor is producing any temporary dictionary:
for key,data in lst.iteritems():
if data['Keya']=='132a':
return key #or break is not in a function
Å different way to do this is to use the appropriate data structure: Keep a "reverse map" of key-value pairs to names. If your dictionary of dictionaries is static after being built, you can build the reverse dictionary like this:
revdict = {(key, value): name
for name, subdict in dictodicts.items()
for key, value in subdict.items()}
If not, you just need to add revdict[key, value] = name for each d[name][key] = value statement and build them up in parallel.
Either way, to find the name of the dict that maps key to value, it's just:
revdict[key, value]
For (a whole lot) more information (than you actually want), and some sample code for wrapping things up in different ways… I dug up an unfinished blog post, considered editing it, and decided to not bother and just clicked Publish instead, so: Reverse dictionary lookup and more, on beyond z.

How do I trust the order of a Python dictionary?

I'm trying to make a dictionary in Python that I can sort through but it seems to change order when I add new things. Is there a way around this?
A standard Dictionary does not impose an ordering, it's simply a lookup.
You want an Ordered Dictionary or Ordered Dictionary.
Python dicts are built as hash tables -- great performance, but ordering is essentially arbitrary and unpredictable. If your need for predictably-ordered walks are occasional, and based on keys or values, the sorted built-in is very handy:
# print all entries in sorted key order
for k in sorted(d): print k, d[k]
# print all entries in reverse-sorted value order
for k in sorted(d, key=d.get, reverse=True): print k, d[k]
# given all keys are strings, print in case-insensitive sorted order
for k in sorted(d, key=str.lower): print k, d[k]
and so forth. If you needs are different (e.g., keep track of the respective times at which keys are inserted, or their values altered, and so forth), the "ordered dictionaries" suggested in other answers will serve you better (but never with the awesome raw performance of a true dict!-).

Categories

Resources