How can I get a ranked list from a dictionary? - python

I am working in Python. The dictionary I have looks like this:
score = {'a':{4:'c', 3:'d'}, 'b':{6:'c', 3:'d'}}
And I need to order it like this:
rank = [{a:3, b:6}, {a:4, b:3}]
Where the sub-dictionary with the greatest combination of exclusive key values is in the first element, the second greatest combination of exclusive key values is in the second element and so forth. The greatest combination logic would be: 1. Grab the biggest combination (total sum) of keys from each dictionary (in this case it would be a->4:'c' and b->6:'d'. Remove those values from the dictionary and grab the next biggest combination of keys (in this case, it would be a->4:'c' and b->3:'d'). This should continue until the original dictionary is empty.
It is exclusive because once the once a value has been used from the original dict, it should be removed, or excluded from being used again in any future combinations.
I have tried all the different approaches I know, but algorithmically I am missing something.

I think I made what you're looking for? It's a weird algorithm, and it's kinda dirty due to the try/except block, but it works.
Edit: added comments and removed unneeded code.
def rank(toSort):
#importing from the string library
from string import lowercase as alph
#temporary list
_ranks=[]
#populate with empty dictonaries
for i in range(len(toSort)):
_ranks.append({})
#the actual sorting algorithm
for i in range(len(toSort)-1):
#iterate all k/v pairs in the supplied dictionary
for k,v in toSort.iteritems():
#iterate all k/v pairs in v element
for a,b in v.iteritems():
#if the alpha index of an element is equal to
#the max alpha index of elements in its containing dictionary...
if alph.index(b)==max(map(alph.index,v.values())):
_ranks[i][k]=a
#if it isn't..
else:
try:
_ranks[i+1][k]=a
except IndexError:
_ranks[-1][k]=a
return _ranks

Related

Python: Obtaining index of an element within a value list of a dictionary

I have a dictionary with key:value list pairings, and I intend to find the index of the value list that contains the desired element.
E.g., if the dictionary is:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
Then, given element 'v2' I should be able to get index 2.
For a value list with one element, the index can be obtained with: list(my_dict.values()).index(['v1']) , however this approach does not work with lists containing multiple elements.
Using for loop, it can be obtained via:
for key, value in my_dict.items():
if value is None:
continue
if 'v2' in value:
print (list(my_dict.keys()).index(key))
Is there a neater (pythonic) way to obtain the same?
You've got an XY problem. You want to know the key that points to a value, and you think you need to find the enumeration index iterating the values so you can then use it to find the key by iteration as well. You don't need all that. Just find the key directly:
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
value = 'v2'
# Iterate key/vals pairs in genexpr; if the vals contains value, yield the key,
# next stops immediately for the first key yielded so you don't iterate the whole dict
# when the value is found on an early key
key_for_value = next(key for key, vals in my_dict.items() if vals and value in vals)
print(key_for_value)
Try it online!
That'll raise StopIteration if the value doesn't exist, otherwise it directly retrieves the first key where the values list for that key contains the desired value.
If you don't really have an XY problem, and the index is important (it shouldn't be, that's a misuse of dicts) it's trivial to produce it as well, changing the extraction of the key to get both, e.g.:
index, key_for_value = next((i, key) for i, (key, vals) in enumerate(my_dict.items()) if vals and value in vals)
Mind you, this is a terrible solution if you need to perform these lookups a lot and my_dict isn't trivially small; it's O(n) on the total number of values, so a large dict would take quite a while to check (relative to the cost of just looking up an arbitrary key, which is average-case O(1)). In that case, ideally, if my_dict doesn't change much/at all, you'd construct a reversed dictionary up-front to find the key(s) associated with a value, e.g.:
from collections import defaultdict
my_dict = {"key1":['v1'], "key2":None, "key3":['v2','v3'], "key4":['v4','v5','v6']}
reversed_my_dict = defaultdict(set)
for key, vals in my_dict:
for val in vals:
reversed_my_dict[val].add(key)
reversed_my_dict = dict(reversed_my_dict) # Optional: Prevents future autovivification of keys
# by converting back to plain dict
after which you can cheaply determine the key(s) associated with a given value with:
reversed_my_dict.get(value, ()) # Using .get prevents autovivification of keys even if still a defaultdict
which returns the set of all keys that map to that value, if any, or the empty tuple if not (if you convert back to dict above, reversed_my_dict[value] would also work if you'd prefer to get a KeyError when the value is missing entirely; leaving it a defaultdict(set) would silently construct a new empty set, map it to the key and return it, which is fine if this happens rarely, but a problem if you test thousands of unmapped values and create a corresponding thousands of empty sets for no benefit, consuming memory wastefully).
Which you choose depends on how big my_dict is (for small my_dict, O(n) work doesn't really matter that much), how many times you need to search it (fewer searches mean less gain from reversed dict), and whether it's regularly modified. For that last point, if it's never modified, or rarely modified between lookups, rebuilding the reversed dict from scratch after each modification might be worth it for simplicity (assuming you perform many lookups per rebuild); if it's frequently modified, the reversed dict might still be worth it, you'd just have to update both the forward and reversed dicts rather than just one, e.g., expanding:
# New key
my_dict[newkey] = [newval1, newval2]
# Add value
my_dict[existingkey].append(newval)
# Delete value
my_dict[existingkey].remove(badval)
# Delete key
del my_dict[existingkey]
to:
# New key
newvals = my_dict[newkey] = [newval1, newval2]
for newval in newvals:
reversed_my_dict[newval].add(newkey) # reversed_my_dict.setdefault(newval, set()).add(newkey) if not defaultdict(set) anymore
# Add value
my_dict[existingkey].append(newval)
reversed_my_dict[newval].add(existingkey) # reversed_my_dict.setdefault(newval, set()).add(existingkey) if not defaultdict(set) anymore
# Delete value
my_dict[existingkey].remove(badval)
if badval not in my_dict[existingkey]: # Removed last copy; test only needed if one key can hold same value more than once
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
# Delete key
# set() conversion not needed if my_dict's value lists guaranteed not to contain duplicates
for badval in set(my_dict.pop(existingkey)):
reversed_my_dict[badval].discard(existingkey)
# Optional delete badval from reverse mapping if last key removed:
if not reversed_my_dict[badval]:
del reversed_my_dict[badval]
respectively, roughly doubling the work incurred by modifications, in exchange for always getting O(1) lookups in either direction.
If you are looking for the key corresponding to a value, you can reverse the dictionary like so:
reverse_dict = {e: k for k, v in my_dict.items() if v for e in v}
Careful with duplicate values though. The last occurence will override the previous ones.
Don't know if it's the best solution but this works:
value = 'v2'
list(map(lambda x : value in x, list(map(lambda x : x[1] or [], list(my_dict.items()))))).index(True)

Most frequents words in Python

I was trying to implement a code that would allow me to find the 10 most frequent words in a text. I'm new at python, and am more used to languages like C#, java or even C++. Here is what I did:
f = open("bigtext.txt","r")
word_count = {}
Basicaly, my idea is to create a dictionary that contains the number of times that each word is present in my text. If the word is not present, I will add it to the dictionary with the value of 1. If the world is already present in the dictionary, I will increment its value by 1.
for x in f.read().split():
if x not in word_count:
word_count[x] = 1
else:
word_count[x] += 1
sorted(word_count.values)
Here, I will sort my dictionary by values (since I'm looking for the 10 most frequent worlds, I need the 10 words with the biggest values).
for keys,values in word_count.items():
values = values + 1
print(word_count[-values])
if values == 10:
break
Here is the part were it all fails. I know now for sure (since I sorted my dictionary by the value of the values). That my 10 most frequent words are the 10 last elements of my dictionary. I want to display those. So I decided to initialize values at 1 and to display my dictionary backward till values = 10 so that I won't need to display more than what I need. But unfortunately, I get this following error:
File "<ipython-input-19-f5241b4c239c>", line 13
for keys,values in word_count.items()
^
SyntaxError: invalid syntax
I do know that my mistake is that I didn't display my dictionary backwards correctly. But I don't know how to proceed elsewhere. So if someone can tell me how to properly display my last 10 elements in my dictionary, I would very much appreciate it. Thank You.
If you didn’t want to use collections.Counter, you could do something like this:
for word, count in sorted(word_count.items(), key=lambda x: -x[1])[:10]:
print(word, count)
This gets all the words in the dictionary, along with their counts, into a list of tuples; sorts that list by the 2nd item in each tuple (the count) descending, and then only prints the first (I.e. highest) ten of those.
I would like to address a big thank you to Ben who told me that I can't sort a dictionary like that.
So this is my final solution (hoping it would help someone else);
my_words = []
for keys, values in word_count.items():
my_words.append((values,keys))
I created a list and I added to it the values I had in my dictionary with the following word for each value.
my_words.sort(reverse = True)
I then sorted my list according to the value in reverse (so that my 10 most frequent worlds would be the 10 first element of my list)
print("The 10 most frequent words in this text are:")
print()
for key, val in my_words[:10]:
print (key, val)
I then simply displayed the 10 first elements of my list.
I would also like to thank all of you who told me about NLTK. I will try it later to have a more optimal and accurate solution.
Thank You so much for your help.

How to compare a python dictionary key with a part of another dictionary's key? something like a .contains() function

Most of my small-scale project worked fine using dictionaries, so changing it now would basically mean starting over.
Let's say I have two different dictionaries(dict1 and dict2).
One being:
{'the dog': 3, 'dog jumped': 4, 'jumped up': 1, 'up onto': 8, 'onto me': 13}
Second one being:
{'up': 12, 'dog': 22, 'jumped': 33}
I want to find wherever the first word of the first dictionary is equal to the word of the second one. These 2 dictionaries don't have the same length, like in the example. Then after I find them, divide their values.
So what I want to do, sort of using a bit of Java is:
for(int i = 0;i<dict1.length(),i++){
for(int j = 0;j<dict2.length(),j++){
if(dict1[i].contains(dict2[j]+" ") // not sure if this works, but this
// would theoretically remove the
// possibility of the word being the
// second part of the 2 word element
dict1[i] / dict2[j]
What I've tried so far is trying to make 4 different lists. A list for dict1 keys, a list for dict1 values and the same for dict2. Then I've realized I don't even know how to check if dict2 has any similar elements to dict1.
I've tried making an extra value in the dictionary (a sort of index), so it would kind of get me somewhere, but as it turns out dict2.keys() isn't iterable either. Which would in turn have me believe using 4 different lists and trying to compare it somehow using that is very wrong.
Dictionaries don't have any facilities at all to handle parts of keys. Keys are opaque objects. They are either there or not there.
So yes, you would loop over all the keys in the first dictionary, extract the first word, and then test if the other dictionary has that first word as a key:
for key, dict1_value in dict1.items():
first_word = key.split()[0] # split on whitespace, take the first result
if first_word in dict2:
dict2_value = dict2[first_word]
print(dict1_value / dict2_value)
So this takes every key in dict1, splits off the first word, and tests if that word is a key in dict2. If it is, get the values and print the result.
If you need to test those first words more often, you could make this a bit more efficient by first building another structure to to create an index from first words to whole keys. Simply store the first words every key of the first dictionary, in a new dictionary:
first_to_keys = {}
for key in dict1:
first_word = key.split()[0]
# add key to a set for first_word (and create the set if there is none yet)
first_to_keys.setdefault(first_word, set()).add(key)
Now first_to_key is a dictionary of first words, pointing to sets of keys (so if the same first word appears more than once, you get all full keys, not just one of them). Build this index once (and update the values each time you add or remove keys from dict1, so keep it up to date as you go).
Now you can compare that mapping to the other dictionary:
for matching in first_to_key.keys() & dict2.keys():
dict2_value = dict2[matching]
for dict1_key in first_to_key[matching]:
dict1_value = dict1[dict1_key]
print(dict1_value / dict2_value)
This uses the keys from two dictionaries as sets; the dict.keys() object is a dictionary view that lets you apply set operations. & gives you the intersection of the two dictionary key sets, so all keys that are present in both.
You only need to use this second option if you need to get at those first words more often. It gives you a quick path in the other direction, so you could loop over dict2, and quickly go back to the first dictionary again.
Here's a solution using the str.startswith method of strings
for phrase, val1 in dict1.items():
for word, val2 in dict2.items():
if phrase.startswith(word):
print(val1/val2)

finding first item in a list whose first item in a tuple is matched

I have a list of several thousand unordered tuples that are of the format
(mainValue, (value, value, value, value))
Given a main value (which may or may not be present), is there a 'nice' way, other than iterating through every item looking and incrementing a value, where I can produce a list of indexes of tuples that match like this:
index = 0;
for destEntry in destList:
if destEntry[0] == sourceMatch:
destMatches.append(index)
index = index + 1
So I can compare the sub values against another set, and remove the best match from the list if necessary.
This works fine, but just seems like python would have a better way!
Edit:
As per the question, when writing the original question, I realised that I could use a dictionary instead of the first value (in fact this list is within another dictionary), but after removing the question, I still wanted to know how to do it as a tuple.
With list comprehension your for loop can be reduced to this expression:
destMatches = [i for i,destEntry in enumerate(destList) if destEntry[0] == sourceMatch]
You can also use filter()1 built in function to filter your data:
destMatches = filter(lambda destEntry:destEntry[0] == sourceMatch, destList)
1: In Python 3 filter is a class and returns a filter object.

Removing one element from an entire dictionary

I've been working on this thing for hours, still cant figure it out :O
The problem I'm having is this. Lets say I have a dictionary with 4-element tuples as elemets and an integer as key. When an element is removed from the whole dictionary (which belongs to every tuple) making two of the tuples (elements) same, the keys of the two tuples don't add up. Instead, a new element is formed, with the key for that element being one of the previous 2 keys.
Let's say I have a dictionary:
dict={('A','B','D','C'): 4, ('C','B','A','D'):5, ('D','A','C','B'):3,('D','A','B','C'):1}
Now I wanna remove one letter from the entire dictionary.
for example, If I wanna remove 'B'. The following new dictionary is formed, but isn't returned, because two of the elements are the same.
{('A','D','C'): 4, ('C','A','D'):5, ('D','A','C'):3,('D','A','C'):1}
Instead of ('D','A','C'):3,('D','A','C'):1 becoming ('D','A','C'):4, this is what ends up happenening:
('D','A','C'):3 along with other tuples
So basically, one of the tuples disappears.
This is the method I'm currently using:
for next in dict:
new_tuple=()
for i in next:
if i!='A':
new_tuple+=(i,)
new_dict[new_tuple]=dict[next]
The above code returns new_dict as the following:
{('A','D','C'): 4, ('C','A','D'):5, ('D','A','C'):3}
So what can I do, to remove one letter from every tuple in the entire dictionary, and if two of the tuples look the same, they merge and the keys add up?
You will have to rebuild your entire dictionary, as each key/value pair is going to be affected. You can use a defaultdict to make the merging easier when you encounter now-overlapping keys:
from collections import defaultdict
new_dict = defaultdict(int)
for key, value in old_dict.items():
new_key = tuple(i for i in key if i != 'A')
new_dict[new_key] += value
Because when first looking up new_key in new_dict it'll be set to 0 by default, all we have to do is add the old value to update new_dict for when we first encounter a key. The next time we encounter the key the values are 'merged' by adding them up.

Categories

Resources