Dictionary using 2 keys with easy to find maximum value? - python

I'm fairly new to python and I need advice on figuring out how to implement this. I'm not sure what the best structure to use would be.
I need a dictionary type structure that has 2 keys for a value. I need retrieve the value with both keys, but delete the value by either key. I also need to be able to find the maximum value and return the key (or a list of keys if there are duplicate maximums)
Basically this is for finding the longest distance between any 2 points on a graph. I will have a list of points and I can calculate all the distances, but at any time I need to get the maximum distance and which points it connects. Any point can be removed at any time so I need to be able to remove values that connect to those points.
Obviously there is no existing structure that does this so i'll have to write my own class but does anyone have advise where to start? At first I was going to use a dictionary with a tuple key, but is there a fast way to find the maximum value and also get the key (or list of keys - with the possibility of duplicate values). Also how can I easily delete values by a single part of the tuple?
I'm not asking for anyone to solve this for me, I'm trying to learn, but any advice would help. Thanks in advance.

Related

Efficient way of retreiving index of dictionary entry by key in Python

As I understand it, dictionaries in Python are ordered as of Python 3.7. Given a dictionary with N entries, I should be able to associate to each key an index from 0 to N-1. My question is, given a key, is there any way to retrieve this index in an efficient manner? It seems like there should be a more efficient way than retrieving the list of keys and searching for the specific key of interest.
One of the ways to do this is list(dict_name.keys()).index(key_name). Another way would be using operator.indexOf. I'm not sure why you would need the index of the keys in the first place, as getting a value from a dictionary is already O(1), or constant time.

python - access all dictionary elements not corresponding to key

I'm trying to write code to compute pairwise differences in a bunch of data within and between groups. That is, I've loaded the data into a dictionary so that the ith value of data j in group k is accessible by
data[j][group[k]][i]
I've written for loops to calculate all of the within group pairwise differences, but I'm a little stuck on how to then calculate the between groups pairwise differences. Is there a way to compare all of the values in data[j][group[k]] to all of the values in data[j][*NOT*group[k]]?
Thanks for any suggestions.
You Could compare them all and then throw out the ones where the group is the same as the one being compared to. (I hope that makes sense)
Or
make a temporary group[l] equal to group[k] minus the instance you are comparing to.

Python dictionary of sets in SQL

I have a dictionary in Python where the keys are integers and the values sets of integers. Considering the potential size (millions of key-value pairs, where a set can contain from 1 to several hundreds of integers), I would like to store it in a SQL (?) database, rather than serialize it with pickle to store it and load it back in whenever I need it.
From reading around I see two potential ways to do this, both with its downsides:
Serialize the sets and store them as BLOBs: So I would get an SQL with two columns, the first column are the keys of the dictionary as INTEGER PRIMARY KEY, the second column are the BLOBS, containing a set of integers.
Downside: Not able to alter sets anymore without loading the complete BLOB in, and after adding a value to it, serialize it back and insert it back to the database as a BLOB.
Add a unique key for each element of each set: I would get two columns, one with the keys (which are now key_dictionary + index element of set/list), one with one integer value in each row. I'd now be able to add values to a "set" without having to load the whole set into python. I would have to put more work in keeping track of all the keys.
In addition, once the database is complete, I will always need sets as a whole, so idea 1 seems to be faster? If I query for all in primary keys BETWEEN certain values, or LIKE certain values, to obtain my whole set in system 2, will the SQL database (sqlite) still work as a hashtable? Or will it linearly search for all values that fit my BETWEEN or LIKE search?
Overall, what's the best way to tackle this problem? Obviously, if there's a completely different 3rd way that solves my problems naturally, feel free to suggest it! (haven't found any other solution by searching around)
I'm kind of new to Python and especially to databases, so let me know if my question isn't clear. :)
You second answer is nearly what I would recommend. What I would do is have three columns:
Set ID
Key
Value
I would then create a composite primary key on the Set ID and Key which guarantees that the combination is unique:
CREATE TABLE something (
set,
key,
value,
PRIMARY KEY (set, key)
);
You can now add a value straight into a particular set (Or update a key in a set) and select all keys in a set.
This being said, your first strategy would be more optimal for read-heavy workloads as the size of the indexes would be smaller.
will the SQL database (sqlite) still work as a hashtable?
SQL databases tend not to use hashtables. Nor do they usually do a sequential lookup. What they do is usually create an index (Which tends to be some kind of tree, e.g. a B-tree) which allows for range lookups (e.g. where you don't know exactly what keys you're looking for).

Why are Python dictionaries NOT stored in the order they were created? [duplicate]

This question already has answers here:
Why is the order in dictionaries and sets arbitrary?
(5 answers)
Closed 7 years ago.
Just curious more than anything else, but why isn't a dictionary such as the one below not ordered the same as it was created? But when I print out test it returns the same order from then on...
test = {'one':'1', 'two':'2', 'three':'3', 'four':'4'}
It's not that I need them ordered, but it's just been on my mind for awhile as to what is occurring here.
The only thing I've found on this is a quote from this article:
Python uses complex algorithms to determine where the key-value pairs are stored in a dictionary.
But what are these "complex algorithms" and why?
Python needs to be able to access D[thing] quickly.
If it stores the values in the order that it receives them, then when you ask it for D[thing], it doesn't know in advance where it put that value. It has to go and find where the key thing appears and then find that value. Since it has no control over the order these are received, this would take about N/2 steps on average where N is the number of keys it's received.
But if instead it has a function (called a hash) that can turn thing in to a location in memory, it can quickly take thing and calculate that value, and check in that spot of memory. Of course, it's got to do a bit more overhead - checking that D[thing] has actually been defined, and checking for those rare cases where you may have defined D[thing1] and D[thing2] where the hash function of thing1 and thing2 happen to be the same (in which case a "collision" occurs and python has to figure out a new place to put one of them).
So for your example, you might expect that when you search for test['four'] it just goes to the last entry in a list it's stored and says "aha, that's '4'." But it can't just do that. How does it know that four corresponds to the last entry of the list. It could have come in any order, so it would have to create some other data structure which allows it to quickly tell that four was the last entry. This would take a lot of overhead.
It would be possible to make it output things in the order they were entered, but that would still require additional overhead tracking the order things were entered.
For curiosity, if you want a ordered dictionary, use OrderedDict

Appropriate data structure for time series

I'm working on an application where I will need to maintain an object's trajectory. Basically, I'd like to have something like a sorted dictionary where the keys are times, and the values are positions. In addition, I'll be doing linear interpolation between existing entries. I've played a little bit with SortedDictionary in Grant Jenks's SortedContainers library, and it does a lot of what I want, but I'm wondering if there are solutions out there that are an even better fit? Thanks in advance for any suggestions.
If you're using pandas, there is time series support available.
If your time interval is reliably constant, a list or of course a numpy array can be used.
Otherwise, you could look into ordered dictionaries in the collections module (std lib)
https://docs.python.org/3/library/collections.html#collections.OrderedDict
https://docs.python.org/2/library/collections.html (Python 2)
class collections.OrderedDict([items])
Return an instance of a dict subclass, supporting the usual dict
methods. An OrderedDict is a dict that remembers the order that
keys were first inserted. If a new entry overwrites an existing entry,
the original insertion position is left unchanged. Deleting an entry
and reinserting it will move it to the end.

Categories

Resources