Efficient way of retreiving index of dictionary entry by key in Python - python

As I understand it, dictionaries in Python are ordered as of Python 3.7. Given a dictionary with N entries, I should be able to associate to each key an index from 0 to N-1. My question is, given a key, is there any way to retrieve this index in an efficient manner? It seems like there should be a more efficient way than retrieving the list of keys and searching for the specific key of interest.

One of the ways to do this is list(dict_name.keys()).index(key_name). Another way would be using operator.indexOf. I'm not sure why you would need the index of the keys in the first place, as getting a value from a dictionary is already O(1), or constant time.

Related

A BTree implementation in python that allows duplicate keys?

I'm trying to build a structure for indexing a database. i.e., pairing the indexed values with a pointer to the tuple.
I found https://pythonhosted.org/BTrees/, however, the API tells me that it doesn't allow for insertion of multiple keys with different values. I find this problematic when I want to create an index on a column that isn't the primary key.
Is there a BTree implementation in python that does allow for insertion of the same keys?
You can create a dictionary, with the values being either a list (for duplicate values), or a set (to exclude duplicate values).

Python dictionary of sets in SQL

I have a dictionary in Python where the keys are integers and the values sets of integers. Considering the potential size (millions of key-value pairs, where a set can contain from 1 to several hundreds of integers), I would like to store it in a SQL (?) database, rather than serialize it with pickle to store it and load it back in whenever I need it.
From reading around I see two potential ways to do this, both with its downsides:
Serialize the sets and store them as BLOBs: So I would get an SQL with two columns, the first column are the keys of the dictionary as INTEGER PRIMARY KEY, the second column are the BLOBS, containing a set of integers.
Downside: Not able to alter sets anymore without loading the complete BLOB in, and after adding a value to it, serialize it back and insert it back to the database as a BLOB.
Add a unique key for each element of each set: I would get two columns, one with the keys (which are now key_dictionary + index element of set/list), one with one integer value in each row. I'd now be able to add values to a "set" without having to load the whole set into python. I would have to put more work in keeping track of all the keys.
In addition, once the database is complete, I will always need sets as a whole, so idea 1 seems to be faster? If I query for all in primary keys BETWEEN certain values, or LIKE certain values, to obtain my whole set in system 2, will the SQL database (sqlite) still work as a hashtable? Or will it linearly search for all values that fit my BETWEEN or LIKE search?
Overall, what's the best way to tackle this problem? Obviously, if there's a completely different 3rd way that solves my problems naturally, feel free to suggest it! (haven't found any other solution by searching around)
I'm kind of new to Python and especially to databases, so let me know if my question isn't clear. :)
You second answer is nearly what I would recommend. What I would do is have three columns:
Set ID
Key
Value
I would then create a composite primary key on the Set ID and Key which guarantees that the combination is unique:
CREATE TABLE something (
set,
key,
value,
PRIMARY KEY (set, key)
);
You can now add a value straight into a particular set (Or update a key in a set) and select all keys in a set.
This being said, your first strategy would be more optimal for read-heavy workloads as the size of the indexes would be smaller.
will the SQL database (sqlite) still work as a hashtable?
SQL databases tend not to use hashtables. Nor do they usually do a sequential lookup. What they do is usually create an index (Which tends to be some kind of tree, e.g. a B-tree) which allows for range lookups (e.g. where you don't know exactly what keys you're looking for).

Appropriate data structure for time series

I'm working on an application where I will need to maintain an object's trajectory. Basically, I'd like to have something like a sorted dictionary where the keys are times, and the values are positions. In addition, I'll be doing linear interpolation between existing entries. I've played a little bit with SortedDictionary in Grant Jenks's SortedContainers library, and it does a lot of what I want, but I'm wondering if there are solutions out there that are an even better fit? Thanks in advance for any suggestions.
If you're using pandas, there is time series support available.
If your time interval is reliably constant, a list or of course a numpy array can be used.
Otherwise, you could look into ordered dictionaries in the collections module (std lib)
https://docs.python.org/3/library/collections.html#collections.OrderedDict
https://docs.python.org/2/library/collections.html (Python 2)
class collections.OrderedDict([items])
Return an instance of a dict subclass, supporting the usual dict
methods. An OrderedDict is a dict that remembers the order that
keys were first inserted. If a new entry overwrites an existing entry,
the original insertion position is left unchanged. Deleting an entry
and reinserting it will move it to the end.

Dictionary using 2 keys with easy to find maximum value?

I'm fairly new to python and I need advice on figuring out how to implement this. I'm not sure what the best structure to use would be.
I need a dictionary type structure that has 2 keys for a value. I need retrieve the value with both keys, but delete the value by either key. I also need to be able to find the maximum value and return the key (or a list of keys if there are duplicate maximums)
Basically this is for finding the longest distance between any 2 points on a graph. I will have a list of points and I can calculate all the distances, but at any time I need to get the maximum distance and which points it connects. Any point can be removed at any time so I need to be able to remove values that connect to those points.
Obviously there is no existing structure that does this so i'll have to write my own class but does anyone have advise where to start? At first I was going to use a dictionary with a tuple key, but is there a fast way to find the maximum value and also get the key (or list of keys - with the possibility of duplicate values). Also how can I easily delete values by a single part of the tuple?
I'm not asking for anyone to solve this for me, I'm trying to learn, but any advice would help. Thanks in advance.

Intersection of Two Lists Of Strings

I had an interview question along these lines:
Given two lists of unordered customers, return a list of the intersection of the two lists. That is, return a list of the customers that appear in both lists.
Some things I established:
Assume each customer has a unique name
If the name is the same in both lists, it's the same customer
The names are of the form first name last name
There's no trickery of II's, Jr's, weird characters, etc.
I think the point was to find an efficient algorithm/use of data structures to do this as efficiently as possible.
My progress went like this:
Read one list in to memory, then read the other list one item at a time to see if there is a match
Alphabetize both lists then start at the top of one list and see if each item appears in the other list
Put both lists into ordered lists, then use the shorter list to check item by item (that way, it one list has 2 items, you only check those 2 items)
Put one list into a hash, and check for the existence of keys from the other list
The interviewer kept asking, "What next?", so I assume I'm missing something else.
Any other tricks to do this efficiently?
Side note, this question was in python, and I just read about sets, which seem to do this as efficiently as possible. Any idea what the data structure/algorithm of sets is?
It really doesnt matter how its implemented ... but I believe it is implemented in C so it is faster and better set([1,2,3,4,5,6]).intersection([1,2,5,9]) is likely what they wanted
In python readability counts for alot! and set operations in python are used extensively and well vetted...
that said another pythonic way of doing it would be
list_new = [itm for itm in listA if itm in listB]
or
list_new = filter(lambda itm:itm in listB,listA)
basically I believe they were testing if you were familliar with python, not if you could implement the algorithm. since they asked a question that is so well suited to python
Put one list into a bloom filter and use that to filter the second list.
Put the filtered second list into a bloom filter and use that to filter the first list.
Sort the two lists and find the intersection by one of the methods above.
The benefit of this approach (besides letting you use a semi-obscure data structure correctly in an interview) is that it doesn't require any O(n) storage until after you have (with high probability) reduced the problem size.
The interviewer kept asking, "What next?", so I assume I'm missing something else.
Maybe they would just keep asking that until you run out of answers.
http://code.google.com/p/python-bloom-filter/ is a python implementation of bloom filters.

Categories

Resources