Python get property value in dict for large dict - python

I have a very large dictionary, like this:
d['property1']['property2'][0]['property3']['property4']['property5']['property6']
I need to get property6. What's the simplest way for me to get this value?
I was thinking something like this would work:
d.lavel6[0]['property6']

Unfortunately, there is no generic way to get value of key from nested dict based on levels (due to obvious reasons). But, you may write a function for your specific scenario in order to simplify it. For example:
def get_value_from_dict(my_dict, level, key):
return my_dict['property1']['property2'][level]['property3']['property4']['property5'][key]

Related

Fast String "Startswith" Matching for Dict like object

I currently have some code which needs to be very performant, where I am essentially doing a string dictionary key lookup:
class Foo:
def __init__(self):
self.fast_lookup = {"a": 1, "b": 2}
def bar(self, s):
return self.fast_lookup[s]
self.fast_lookup has O(1) lookup time, and there is no try/if etc code that would slow down the lookup
Is there anyway to retain this speed while doing a "startswith" lookup instead? With the code above calling bar on s="az" would result in a key error, if it were changed to a "startswith" implementation then it would return 1.
NB: I am well aware how I could do this with a regex/startswith statement, I am looking for performance specifically for startswith dict lookup
An efficient way to do this would be to use the pyahocorasick module to construct a trie with the possible keys to match, then use the longest_prefix method to determine how much of a given string matches. If no "key" matched, it returns 0, otherwise it will say how much of the string passed exists in the automata.
After installing pyahocorasick, it would look something like:
import ahocorasick
class Foo:
def __init__(self):
self.fast_lookup = ahocorasick.Automaton()
for k, v in {"a": 1, "b": 2}.items():
self.fast_lookup.add_word(k, v)
def bar(self, s):
index = self.fast_lookup.longest_prefix(s)
if not index: # No prefix match at all
raise KeyError(s)
return self.fast_lookup.get(s[:index])
If it turns out the longest prefix doesn't actually map to a value (say, 'cat' is mapped, but you're looking up 'cab', and no other entry actually maps 'ca' or 'cab'), this will die with a KeyError. Tweak as needed to achieve precise behavior desired (you might need to use longest_prefix as a starting point and try to .get() for all substrings of that length or less until you get a hit for instance).
Note that this isn't the primary purpose of Aho-Corasick (it's an efficient way to search for many fixed strings in one or more long strings in a single pass), but tries as a whole are an efficient way to deal with prefix search of this form, and Aho-Corasick is implemented in terms of tries and provides most of the useful features of tries to make it more broadly useful (as in this case).
I dont fully understand the question, but what I would do is try and think of ways to reduce the work the lookup even has to do. If you know the basic searches the startswith is going to do, you can just add those as keys to the dictionary and values that point to the same object. Your dict will get pretty big pretty fast, however it will greatly reduce the lookup i believe. So maybe for a more dynamic method you can add dict keys for the first groups of letters up to three for each entry.
Without activly storing the references for each search, your code will always need to get each dict objects value until it gets one that matches. You cannot reduce that.

Python convert named string fields to tuple

Similar to this question: Tuple declaration in Python
I have this function:
def get_mouse():
# Get: x:4631 y:506 screen:0 window:63557060
mouse = os.popen( "xdotool getmouselocation" ).read().splitlines()
print mouse
return mouse
When I run it it prints:
['x:2403 y:368 screen:0 window:60817757']
I can split the line and create 4 separate fields in a list but from Python code examples I've seen I feel there is a better way of doing it. I'm thinking something like x:= or window:=, etc.
I'm not sure how to properly define these "named tuple fields" nor how to reference them in subsequent commands?
I'd like to read more on the whole subject if there is a reference link handy.
It seems it would be a better option to use a dictionary here. Dictionaries allow you to set a key, and a value associated to that key. This way you can call a key such as dictionary['x'] and get the corresponding value from the dictionary (if it exists!)
data = ['x:2403 y:368 screen:0 window:60817757'] #Your return data seems to be stored as a list
result = dict(d.split(':') for d in data[0].split())
result['x']
#'2403'
result['window']
#'60817757'
You can read more on a few things here such as;
Comprehensions
Dictionaries
Happy learning!
try
dict(mouse.split(':') for el in mouse
This should give you a dict (rather than tuples, though dicts are mutable and also required hashability of keys)
{x: 2403, y:368, ...}
Also the splitlines is probably not needed, as you are only reading one line. You could do something like:
mouse = [os.popen( "xdotool getmouselocation" ).read()]
Though I don't know what xdotool getmouselocation does or if it could ever return multiple lines.

How To Create a Unique Key For A Dictionary In Python

What is the best way to generate a unique key for the contents of a dictionary. My intention is to store each dictionary in a document store along with a unique id or hash so that I don't have to load the whole dictionary from the store to check if it exists already or not. Dictionaries with the same keys and values should generate the same id or hash.
I have the following code:
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
print str(a)
print hashlib.sha1(str(a)).hexdigest()
print hashlib.sha1(str(b)).hexdigest()
The last two print statements generate the same string. Is this is a good implementation? or are there any pitfalls with this approach? Is there a better way to do this?
Update
Combining suggestions from the answers below, the following might be a good implementation
import hashlib
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
def get_id_for_dict(dict):
unique_str = ''.join(["'%s':'%s';"%(key, val) for (key, val) in sorted(dict.items())])
return hashlib.sha1(unique_str).hexdigest()
print get_id_for_dict(a)
print get_id_for_dict(b)
I prefer serializing the dict as JSON and hashing that:
import hashlib
import json
a={'name':'Danish', 'age':107}
b={'age':107, 'name':'Danish'}
# Python 2
print hashlib.sha1(json.dumps(a, sort_keys=True)).hexdigest()
print hashlib.sha1(json.dumps(b, sort_keys=True)).hexdigest()
# Python 3
print(hashlib.sha1(json.dumps(a, sort_keys=True).encode()).hexdigest())
print(hashlib.sha1(json.dumps(b, sort_keys=True).encode()).hexdigest())
Returns:
71083588011445f0e65e11c80524640668d3797d
71083588011445f0e65e11c80524640668d3797d
No - you can't rely on particular order of elements when converting dictionary to a string.
You can, however, convert it to sorted list of (key,value) tuples, convert it to a string and compute a hash like this:
a_sorted_list = [(key, a[key]) for key in sorted(a.keys())]
print hashlib.sha1( str(a_sorted_list) ).hexdigest()
It's not fool-proof, as a formating of a list converted to a string or formatting of a tuple can change in some future major python version, sort order depends on locale etc. but I think it can be good enough.
A possible option would be using a serialized representation of the list that preserves order. I am not sure whether the default list to string mechanism imposes any kind of order, but it wouldn't surprise me if it were interpreter-dependent. So, I'd basically build something akin to urlencode that sorts the keys beforehand.
Not that I believe that you method would fail, but I'd rather play with predictable things and avoid undocumented and/or unpredictable behavior. It's true that despite "unordered", dictionaries end up having an order that may even be consistent, but the point is that you shouldn't take that for granted.

List of dictionaries: get() schould loop over list

I know it is easy to implement.
I want a dictionary like class, which takes a list of dictionaries in the constructor.
If you read from this dict by key, the dict-class should check the list of dictionaries and return the first value. If none contains this key KeyError should be thrown like a normal dict.
This dictionary container should be read only for my usage.
You seem to be describing collections.ChainMap, which will be in the next version of Python (3.3, expected to go final later this year). For current/earlier versions of Python, you can copy the implementation from the collections source code.
Not really answer to the question: what if you just define method that merge all dictionaries into one? Why make new class for it?
How to merge: How to merge two Python dictionaries in a single expression?
Varargs: Can a variable number of arguments be passed to a function?
You can easily implement this with this logic.
Iterate over all the dictionaries in the list.
For each dictionary, see if it has the required key by using key in value statement.
If value is found, return the value from the function.
If you have iterated over all dictionaries, and value is not found, Raise KeyError exception.

How to rewrite this Dictionary For Loop in Python?

I have a Dictionary of Classes where the classes hold attributes that are lists of strings.
I made this function to find out the max number of items are in one of those lists for a particular person.
def find_max_var_amt(some_person) #pass in a patient id number, get back their max number of variables for a type of variable
max_vars=0
for key, value in patients[some_person].__dict__.items():
challenger=len(value)
if max_vars < challenger:
max_vars= challenger
return max_vars
What I want to do is rewrite it so that I do not have to use the .iteritems() function. This find_max_var_amt function works fine as is, but I am converting my code from using a dictionary to be a database using the dbm module, so typical dictionary functions will no longer work for me even though the syntax for assigning and accessing the key:value pairs will be the same. Thanks for your help!
Since dbm doesn't let you iterate over the values directly, you can iterate over the keys. To do so, you could modify your for loop to look like
for key in patients[some_person].__dict__:
value = patients[some_person].__dict__[key]
# then continue as before
I think a bigger issue, though, will be the fact that dbm only stores strings. So you won't be able to store the list directly in the database; you'll have to store a string representation of it. And that means that when you try to compute the length of the list, it won't be as simple as len(value); you'll have to develop some code to figure out the length of the list based on whatever string representation you use. It could just be as simple as len(the_string.split(',')), just be aware that you have to do it.
By the way, your existing function could be rewritten using a generator, like so:
def find_max_var_amt(some_person):
return max(len(value) for value in patients[some_person].__dict__.itervalues())
and if you did it that way, the change to iterating over keys would look like
def find_max_var_amt(some_person):
dct = patients[some_person].__dict__
return max(len(dct[key]) for key in dct)

Categories

Resources