Aggregating and Renaming Keys in Dictionary - python

I have a word occurrence dictionary, and a synonym dictionary.
Word occurrence dictionary example:
word_count = {'grizzly': 2, 'panda': 4, 'beer': 3, 'ale': 5}
Synonym dictionary example:
synonyms = {
'bear': ['grizzly', 'bear', 'panda', 'kodiak'],
'beer': ['beer', 'ale', 'lager']
}
I would like to comibine/rename aggregate the word count dictionary as
new_word_count = {'bear': 6, 'beer': 8}
I thought I would try this:
new_dict = {}
for word_key, word_value in word_count.items(): # Loop through word count dict
for syn_key, syn_value in synonyms.items(): # Loop through synonym dict
if word_key in [x for y in syn_value for x in y]: # Check if word in synonyms
if syn_key in new_dict: # If so:
new_dict[syn_key] += word_value # Increment count
else: # If not:
new_dict[syn_key] = word_value # Create key
But this isn't working, new_dict ends up empty. Also, is there an easier way to do this? Maybe using dictionary comprehension?

Using dict comprehension, sum and dict.get:
In [11]: {w: sum(word_count.get(x, 0) for x in ws) for w, ws in synonyms.items()}
Out[11]: {'bear': 6, 'beer': 8}
Using collections.Counter and dict.get:
from collections import Counter
ec = Counter()
for x, vs in synonyms.items():
for v in vs:
ec[x] += word_count.get(v, 0)
print(ec) # Counter({'bear': 6, 'beer': 8})

Let's change your synonym dictionary a little. Instead of mapping from a word to a list of all its synonyms, let's map from a word to its parent synonym (i.e. ale to beer). This should speed up lookups
synonyms = {
'bear': ['grizzly', 'bear', 'panda', 'kodiak'],
'beer': ['beer', 'ale', 'lager']
}
synonyms = {syn:word for word,syns in synonyms.items() for syn in syns}
Now, let's make your aggregate dictionary:
word_count = {'grizzly': 2, 'panda': 4, 'beer': 3, 'ale': 5}
new_word_count = {}
for word,count in word_count:
word = synonyms[word]
if word not in new_word_count:
new_word_count[word] = 0
new_word_count[word] += count

Related

Write a function to add key-value pairs

I'd like to write a function that will take one argument (a text file) to use its contents as keys and assign values to the keys. But I'd like the keys to go from 1 to n:
{'A': 1, 'B': 2, 'C': 3, 'D': 4... }.
I tried to write something like this:
Base code which kind of works:
filename = 'words.txt'
with open(filename, 'r') as f:
text = f.read()
ready_text = text.split()
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
for item in lst:
if item not in dictionary:
dictionary[item] = 1
else:
dictionary[item] += 1
return dictionary
print(create_dict(ready_text))
The output: {'A': 1, 'B': 1, 'C': 1, 'D': 1... }.
Attempt to make the thing work:
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
values = list(range(100)) # values
for item in lst:
if item not in dictionary:
for value in values:
dictionary[item] = values[value]
else:
dictionary[item] = values[value]
return dictionary
The output: {'A': 99, 'B': 99, 'C': 99, 'D': 99... }.
My attempt doesn't work. It gives all the keys 99 as their value.
Bonus question: How can I optimaze my code and make it look more elegant/cleaner?
Thank you in advance.
You can use dict comprehension with enumerate (note the start parameter):
words.txt:
colorless green ideas sleep furiously
Code:
with open('words.txt', 'r') as f:
words = f.read().split()
dct = {word: i for i, word in enumerate(words, start=1)}
print(dct)
# {'colorless': 1, 'green': 2, 'ideas': 3, 'sleep': 4, 'furiously': 5}
Note that "to be or not to be" will result in {'to': 5, 'be': 6, 'or': 3, 'not': 4}, perhaps what you don't want. Having only one entry out of two (same) words is not the result of the algorithm here. Rather, it is inevitable as long as you use a dict.
Your program sends a list of strings to create_dict. For each string in the list, if that string is not in the dictionary, then the dictionary value for that key is set to 1. If that string has been encountered before, then the value of that key is increased by 1. So, since every key is being set to 1, then that must mean there are no repeat keys anywhere, meaning you're sending a list of unique strings.
So, in order to have the numerical values increase with each new key, you just have to increment some number during your loop:
num = 0
for item in lst:
num += 1
dictionary[item] = num
There's an easier way to loop through both numbers and list items at the same time, via enumerate():
for num, item in enumerate(lst, start=1): # start at 1 and not 0
dictionary[item] = num
You can use this code. If an item has been in the lst more than once, the idx is considered one time in dictionary!
def create_dict(lst):
""" go through the arg, stores items in it as keys in a dict"""
dictionary = dict()
idx = 1
for item in lst:
if item not in dictionary:
dictionary[item]=idx
idx += 1
return dictionary

creating a dictionary within a dictionary

I have a dictionary called playlist with a timestamp as the key and song title and artist as the values, stored in a tuple, formatted as below:
{datetime.datetime(2019, 11, 4, 20, 2): ('Closer', 'The Chainsmokers'),
datetime.datetime(2019, 11, 4, 19, 59): ('Piano Man', 'Elton John'),
datetime.datetime(2019, 11, 4, 19, 55): ('Roses', 'The Chainsmokers')}
I am trying to set the artist from this dictionary/tuple and set it as the key in a new dictionary, with the values being songs by that artist and the frequency it occurs in the dictionary. Example output is:
{'Chainsmokers': {'Closer': 3, 'Roses': 1},
'Elton John': {'Piano Man': 2}, … }
This is what I have for code so far:
dictionary = {}
for t in playlist.values():
if t[1] in dictionary:
artist_song[t[1]] += 1
else:
artist_songs[t[1]] = 1
print(dictionary)
However, this only returns the artist as the key and the frequency of artist plays as values.
Thanks in advance for any help.
Use a defaultdict that has a defaultdict as it's default and finally has an int as nested default:
from collections import defaultdict
d = defaultdict(lambda: defaultdict(int))
for song, artist in playlist.values():
d[artist][song] += 1
print(d)
# {'The Chainsmokers': {'Closer': 1, 'Roses': 1}), 'Elton John': {'Piano Man': 1})}
Non defaultdict method is a bit long-winded as we need to be sure that the dicts exist, which is what the defaultdict handles for us.
d = {}
for song, artist in playlist.values():
d.setdefault(artist, {})
d[artist].setdefault(song, 0)
d[artist][song] += 1
Just for fun, here's an alternative version that makes use of collections.Counter:
from collections import defaultdict, Counter
song_count = defaultdict(dict)
for (song, artist), count in Counter(playlist.values()).items():
song_count[artist][song] = count

index word in dictionary

I have a text file where I want each word in the text file in a dictionary and then print out the index position each time the word is in the text file.
The code I have is only giving me the number of times the word is in the text file. How can I change this?
I have already converted to lowercase.
dicti = {}
for eachword in wordsintxt:
freq = dicti.get(eachword, None)
if freq == None:
dicti[eachword] = 1
else:
dicti[eachword] = freq + 1
print(dicti)
Change your code to keep the indices themselves, rather than merely count them:
for index, eachword in enumerate(wordsintxt):
freq = dicti.get(eachword, None)
if freq == None:
dicti[eachword] = []
else:
dicti[eachword].append(index)
If you still need the word frequency: that's easy to recover:
freq = len(dicti[word])
Update per OP comment
Without enumerate, simply provide that functionality yourself:
for index in range(len(wordsintxt)):
eachword = wordsintxt[i]
I'm not sure why you'd want to do that; the operation is idiomatic and common enough that Python developers created enumerate for exactly that purpose.
You can use this:
wordsintxt = ["hello", "world", "the", "a", "Hello", "my", "name", "is", "the"]
words_data = {}
for i, word in enumerate(wordsintxt):
word = word.lower()
words_data[word] = words_data.get(word, {'freq': 0, 'indexes': []})
words_data[word]['freq'] += 1
words_data[word]['indexes'].append(i)
for k, v in words_data.items():
print(k, '\t', v)
Which prints:
hello {'freq': 2, 'indexes': [0, 4]}
world {'freq': 1, 'indexes': [1]}
the {'freq': 2, 'indexes': [2, 8]}
a {'freq': 1, 'indexes': [3]}
my {'freq': 1, 'indexes': [5]}
name {'freq': 1, 'indexes': [6]}
is {'freq': 1, 'indexes': [7]}
You can avoid checking if the value exists in your dictionary and then performing a custom action by just using data[key] = data.get(key, STARTING_VALUE)
Greetings!
Use collections.defaultdict with enumerate, just append all the indexes you retrieve from enumerate
from collections import defaultdict
with open('test.txt') as f:
content = f.read()
words = content.split()
dd = defaultdict(list)
for i, v in enumerate(words):
dd[v.lower()].append(i)
print(dd)
# defaultdict(<class 'list'>, {'i': [0, 6, 35, 54, 57], 'have': [1, 36, 58],... 'lowercase.': [62]})

How to alternate lower and upper case on dictionary keys? Python 3

im new on python 3.
What I want to do is to alternate upper and lowercase but only on a dictionary key.
my dictionary is created from a list, its key is the word (or list element) and its value is the times this element appears in the list.
kb = str(input("Give me a string: "));
txt = kb.lower(); #Turn string into lowercase
cadena = txt.split(); #Turn string into list
dicc = {};
for word in cadena:
if (word in dicc):
dicc[word] = dicc[word] + 1
else:
dicc[word] = 1
print(dicc)
With this code i can get for example:
input: "Hi I like PYthon i am UsING python"
{'hi': 1, 'i': 2, 'like': 1, 'python': 2, 'am': 1, 'using': 1}
but what I am trying to get is actually is:
{'hi': 1, 'I': 2, 'like': 1, 'PYTHON': 2, 'am': 1, 'USING': 1}
I tried using this:
for n in dicc.keys():
if (g%2 == 0):
n.upper()
else:
n.lower()
print(dicc)
But it seems that I have no idea of what I'm doing.
Any help would be appreciated.
Using itertools and collections.OrderedDict (to guarantee order in Python < 3.7)
Setup
import itertools
from collections import OrderedDict
s = 'Hi I like PYthon i am UsING python'
switcher = itertools.cycle((str.lower, str.upper))
d = OrderedDict()
final = OrderedDict()
First, create an OrderedDictionary just to count the occurences of strings in your list (since you want matches to be case insensitive based on your output):
for word in s.lower().split():
d.setdefault(word, 0)
d[word] += 1
Next, use itertools.cycle to call str.lower or str.upper on keys and create your final dictionary:
for k, v in d.items():
final[next(switcher)(k)] = v
print(final)
OrderedDict([('hi', 1), ('I', 2), ('like', 1), ('PYTHON', 2), ('am', 1), ('USING', 1)])
Your n in dicc.keys() line is wrong. You are trying to use n as both the position in the array of keys and the key itself.
Also the semicolons are unnecessary.
This should do what you want:
from collections import OrderedDict
# Receive user input
kb = str(input("Give me a string: "))
txt = kb.lower()
cadena = txt.split()
dicc = OrderedDict()
# Construct the word counter
for word in cadena:
if word in dicc:
dicc[word] += 1
else:
dicc[word] = 1
If you just want to print the output with alternating case, you can do something like this:
# Print the word counter with alternating case
elems = []
for i, (word, wordcount) in enumerate(dicc.items()):
if i % 2 == 0:
word = word.upper()
elems.append('{}: {}'.format(word, wordcount)
print('{' + ', '.join(elems) + '}')
Or you can make a new OrderedDict with alternating case...
dicc_alt_case = OrderedDict((word.upper() if (i % 2 == 0) else word, wordcount)
for word, wordcount in dicc.items())

Reassign dictionary values

I have a dictionary like
{'A': 0, 'B': 1, 'C': 2, 'D': 3, etc}
How can I remove elements from this dictionary without creating gaps in values, in case the dictionary is not ordered?
An example:
I have a big matrix, where rows represent words, and columns represent documents where these words are encountered. I store the words and their corresponding indices as a dictionary. E.g. for this matrix
2 0 0
1 0 3
0 5 1
4 1 2
the dictionary would look like:
words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}
If I remove the words 'apple' and 'banana', the matrix would contain only two rows. So the value of 'orange' in the dictionary should now equal 0 and not 1, and the value of 'pear' should be 1 instead of 3.
In Python 3.6+ dictionaries are ordered, so I can just write something like this to reassign the values:
i = 0
for k, v in words.items():
v = i
i += 1
or, alternatively
words = dict(zip(terms.keys(), range(0, matrix.shape[0])))
I think, this is far from being the most efficient way to change the values, and it wouldn't work with unordered dictionaries. How to do it efficiently? Is there any way to easily reassign the values in case the dictionary is not ordered?
Turn the dict into a sorted list and then build a new dict without the words you want to remove:
import itertools
to_remove = {'apple', 'banana'}
# Step 1: sort the words
ordered_words = [None] * len(words)
for word, index in words.items():
ordered_words[index] = word
# ordered_words: ['apple', 'orange', 'banana', 'pear']
# Step 2: Remove unwanted words and create a new dict
counter = itertools.count()
words = {word: next(counter) for word in ordered_words if word not in to_remove}
# result: {'orange': 0, 'pear': 1}
This has a runtime of O(n) because manually ordering the list with indexing operations is a linear operation, as opposed to sorted which would be O(n log n).
See also the documentation for itertools.count and next.
You could always keep an inverted dictionary that maps indices to words, and use that as a reference for keeping the order of the original dictionary. Then you could remove the words, and rebuild the dictionary again:
words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}
# reverse dict for index -> word mappings
inverted = {i: word for word, i in words.items()}
remove = {'apple', 'banana'}
# sort/remove the words
new_words = [inverted[i] for i in range(len(inverted)) if inverted[i] not in remove]
# rebuild new dictionary
new_dict = {word: i for i, word in enumerate(new_words)}
print(new_dict)
Which Outputs:
{'orange': 0, 'pear': 1}
Note: Like the accepted answer, this is also O(n).
You can use your existing logic, using a representation of the dictionary that is sorted:
import operator
words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}
sorted_words = sorted(words.items(), key=operator.itemgetter(1))
for i, (k, v) in enumerate(sorted_words):
words[k] = i
Initially we have:
words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}
To reorder from minimum to maximum, you may use sorted and dictionary comprehension:
std = sorted(words, key=lambda x: words[x])
newwords = {word:std.index(word) for word in std}
You are using the wrong tool (dict) for the job, you should use a list
class vocabulary:
def __init__(self, *words):
self.words=list(words)
def __getitem__(self, key):
try:
return self.words.index(key)
except ValueError:
print (key + " is not in vocabulary")
def remove(self, word):
if type(word)==int:
del self.words[word]
return
return self.remove(self[word])
words = vocabulary("apple" ,"banana", "orange")
print (words["banana"]) # outputs 1
words.remove("apple")
print (words["banana"]) # outputs 0
A note on complexity
I had several comments mentioning that a dict is more efficient because it's lookup time is O(1) and the lookup time of a list is O(n).
This is simply not true in this case.
The O(1) guarantee of a hash table (dict in python), is a result of an amortised complexity, meaning, that you average a common usage of lookup table that is generated once, assuming that your hash function is balanced.
This amortised calculation does not take into account deleting the entire dictionary and regenerating it every time you remove an item, as some of the other answers suggest.
The list implementation and the dict implementation have the same worst-case complexity of O(n).
Yet, the list implementation could be optimised with two lines of python (bisect) to have a worst-case complexity of O(log(n))

Categories

Resources