I have a dictionary in the form:
{"a": (1, 0.1), "b": (2, 0.2), ...}
Each tuple corresponds to (score, standard deviation).
How can I take the average of just the first integer in each tuple?
I've tried this:
for word in d:
(score, std) = d[word]
d[word]=float(score),float(std)
if word in string:
number = len(string)
v = sum(score)
return (v) / number
Get this error:
v = sum(score)
TypeError: 'int' object is not iterable
It's easy to do using list comprehensions. First, you can get all the dictionary values from d.values(). To make a list of just the first item in each value you make a list like [v[0] for v in d.values()]. Then, just take the sum of those elements, and divide by the number of items in the dictionary:
sum([v[0] for v in d.values()]) / float(len(d))
As Pedro rightly points out, this actually creates the list, and then does the sum. If you have a huge dictionary, this might take up a bunch of memory and be inefficient, so you would want a generator expression instead of a list comprehension. In this case, that just means getting rid of one pair of brackets:
sum(v[0] for v in d.values()) / float(len(d))
The two methods are compared in another question.
Related
I have a python assignment to extract bigrams from a string into a dictionary and I think I have found the solution online but cant remember where I found it. But it seems to work but I am having trouble understanding it as I am new to python. Can anyone explain the code below which takes a string and extracts chars into tuples and counts instances and puts it into a dictionary
'''
s = 'Mississippi' # Your string
# Dictionary comprehension
dic_ = {k : s.count(k) for k in{s[i]+s[i+1] for i in range(len(s)-1)}}
'''
First let's understand comprehensions:
A list, dict, set, etc. can be made with a comprehension. Basically a comprehension is taking a generator and using it to form a new variable. A generator is just an object that returns a different value each iteration so to use list as an example: to make a list with a list comprehension we take the values that the generator outputs and put them into their own spot in a list. Take this generator for example:
x for x in range(0, 10)
This will just give 0 on the first iteration, then 1, then 2, etc. so to make this a list we would use [] (list brakets) like so:
[x for x in range(0, 10)]
This would give:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] #note: range does not include the second input
for a dictionary and for a set we use {}, but since dictionaries uses key-value pairs our generator will be different for sets and dictionaries. For a set it is the same as a list:
{x for x in range(0, 10)} #gives the set --> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
but for a dictionary we need a key and a value. Since enumerate gives two items this could be useful for dictionaries in some cases:
{key: value for key, value in enumerate([1,2,3])}
In this case the keys are the indexes and the values are the items in the list. So this gives:
{0: 1, 1: 2, 2: 3} #dictionary
It doesn't make a set because we denote x : y which is the format for items in a dictionary, not a set.
Now, let's break this down:
This part of the code:
{s[i]+s[i+1] for i in range(len(s)-1)}
is making a set of values that is every pair of touching letters, s[i] is one letter, s[i+1] is the letter after, so it is saying get this pair (s[i]+s[i+1]) and do it for every item in the string (for i in range(len(s)-1) Notice there is a -1 since the last letter does not have a touching letter after it (so we don't want to run it for the last letter).
Now that we have a set let's save it to a variable so it's easier to see:
setOfPairs = {s[i]+s[i+1] for i in range(len(s)-1)}
Then our original comprehension would change to:
{k : s.count(k) for k in setOfPairs}
This is saying we want to make a dictionary that has keys of k and values of s.count(k) since we get every k from our pairs list: for k in setOfPairs the keys of the dictionary are, then, the pairs. Since s.count(k) returns the number of times k is in s, the values of the dictionary are the number of times the key appears in s.
Let's take this apart one step at a time:
s[i] is the code to select the i-th letter in the string s.
s[i]+s[i+1] concatenates the letter at position i and the letter at position i+1.
s[i]+s[i+1] for i in range(len(s)-1) iterates each index i (except the last one) and so computes all the bigrams.
Since the expression in 3 is surrounded by curly brackets, the result is a set, meaning that all duplicate bigrams are removed.
for k in {s[i]+s[i+1] for i in range(len(s)-1)} therefore iterates over all unique bigrams in the given string s.
Lastly, {k : s.count(k) for k in{s[i]+s[i+1] for i in range(len(s)-1)}} maps each each bigram k to the amount of times it appears in s, because the str.count function returns the number of times a substring appears in a string.
I hope that helps. If you want to know more about list/set/dict comprehensions in Python, the relevant entry in the Python documentation is here: https://docs.python.org/3/tutorial/datastructures.html?highlight=comprehension#list-comprehensions
dic_ = {k : s.count(k) for k in{s[i]+s[i+1] for i in range(len(s)-1)}}
Read backwards
dic_ = {k : s.count(k)
## Step 3 with each of the pair of characters
count how many are in the string
store the 2 char string as the key and the count as the value for the dictionary.
for k in{s[i]+s[i+1]
# Step 2 From each of the position take 2 characters out of the string
for i in range(len(s)-1)}}
# Step 1 loop over all but the last character of the string.
The code may be inefficient for long strings with many repetitions. Step 2 takes every pair so the count and store will be repeated count times.
Refactoring so you can test if the key already exists and not repeating the count may speed it up. bench mark time on ... say a billion base pair DNA sequence.
I have the following dict:
{('I', 'like'):14, ('he','likes'):2, ('I', 'hate'):12}
For a given word string I want to get the second element of all tuples in dictionary (which is a key of a dictionary) that has this word as the first element.
I tried:
word='I'
second_word = (k[0][1] for k, v in d if word == k[0][0])
print(second_word)
and expected to get "like" as an answer but got:
<generator object generate_ngram_sentences.<locals>.<genexpr> at 0x7fed65bd0678>
<generator object generate_ngram_sentences.<locals>.<genexpr> at 0x7fed65bd0678>
<generator object generate_ngram_sentences.<locals>.<genexpr> at 0x7fed65bd06d0>
How to get not only first occurrence but all of such occurrences in dictionary?
EDIT:
2. Can you share how could it be modified in case the size of the tuple to be dynamic. So that the key of the dict would store eg. 2elem tuple or 15elem etc. tuple depending on dict?
You have the correct idea, but it needed to be fine tuned:
d = {('I', 'like'):14, ('he','likes'):2, ('I', 'hate'):12}
word='I'
second_word = [k[1] for k in d if k[0] == 'I']
print(second_word)
The output is a list of all second words for all the lkeys whose first item is 'I'
['like', 'hate']
from there:
second_word[0] --> 'like'
I steal the first sentence from Reblochon Masque's answer:
You have the correct idea, but it needed to be fine tuned:
second_word = (k[0][1] for k, v in d if word == k[0][0])
Iterating directly over d generates the keys only (which are what you are interested in, so this was the right idea).
Now, for k, v in d actually works, not because you get the key and the value, but because the key is a tuple and you unpack the two items in the tuple to the names k and v.
So k already is the first word and v is the second word, and you don't need to use any indexing like [0][0] or [0][1].
Using different names makes it clearer:
word = 'I'
second_words = (second for first, second in d if first == word)
Note that now second_words is a generator expression and not a list. If you simply go on iterating over second_words this is fine, but if you actually want the list, change the generator expression to a list comprehension by replacing the () by [].
Let's assume that there is a dictionary list like this one:
lst = {(1,1):2, (1,2):5, (1,3):10, (1,4):14, (1,6):22}
I want a simple (the most efficient) function that returns the dictionary key which its value is the maximum.
For example:
key_for_max_value_in_dict(lst) = (1,6)
because the tuple (1,6) has the most value (22).
I came up with this code which might be the most efficient one:
max(lst, key=lambda x: lst[x])
Use a comprehension for that like:
Code:
max((v, k) for k, v in lst.items())[1]
How does it work?
Iterate over the items() in the dict, and emit them as tuples of (value, key) with the value first in the tuple. max() can then find the largest value, because tuples sort by each element in the tuple, with first element matching first element. Then take the second element ([1]) of the max tuple since it is the key value for the max value in the dict.
Test Code:
lst = {(1,1):2, (1,2):5, (1,3):10, (1,4):14, (1,6):22}
print(max((v, k) for k, v in lst.items())[1])
Results;
(1, 6)
Assuming you're using a regular unsorted dictionary, you'll need to walk down the entire thing once. Keep track of what the largest element is and update it if you see a larger one. If it is the same, add to the list.
largest_key = []
largest_value = 0
for key, value in lst.items():
if value > largest_value:
largest_value = value
largest_key = [key]
elif value == largest_value:
largest_key.append(key)
I have written a simple script the scope of which is:
list=[1,19,46,28 etc...]
dictionary={Joey:(10,2,6,19), Emily: (0,3), etc}
Now I need to find all the keys of the dictionary which have at least one of the list entries in the values
Example: 19 is in Joeys values, so Joey is the winner.
How I did it: (no programmer at all)
# NodesOfSet = the list
# elementsAndTheirNodes = the dictionary
# loop as many times as the number of key:value entries in the dictionary element:nodes
# simply: loop over all the elements
for i in range (0, len (elementsAndTheirNodes.keys())):
# there is an indent here (otherwise it wouldnt work anyway)
# loop over the tuple that serves as the value for each key for a given i-th key:value
# simply: loop over all their nodes
for j in range (0, len (elementsAndTheirNodes.values()[i])):
# test: this prints out element + 1 node and so on
# print (elementsAndTheirNodes.keys()[i], elementsAndTheirNodes.values()[i][j] )
for k in range (0, len (NodesOfSet)):
if NodesOfSet[k] == (elementsAndTheirNodes.values()[i][j]):
print ( elementsAndTheirNodes.keys()[i], " is the victim")
else:
print ( elementsAndTheirNodes.keys()[i], " is not the victim")
But this is very time consuming as it iterates over basically everything in the database. May I ask for a help optimizing this? Thanks!
I would use a list comprehension and the builtin any which shortcircuits once a shared item is found. Turning your list into a set reduces the complexity of the membership lookup from O(n) to O(1):
s = set(lst)
result = [k for k, v in dct.items() if any(i in s for i in v)]
Be careful to not assign builtins as the names for your objects (e.g. list) to avoid making the builtin unusable later on in your code.
Don't use the name list, list is the name of a library function.
l = [1, 19, 46, 28, ...]
l_set = set(l)
d = {'Joey':(10,2,6,19), 'Emily': (0,3), ...}
winners = [k for k, v in d.items() if any(i in l_set for i in v)]
any will stop iterating through v as soon as it "sees" a shared value, saving some time.
You could also use set intersection to check if any of the elements in the dictionary value tuples have anything in common with your "list" entries:
l = [1,19,46,28, ...]
s = set(l)
d = {Joey:(10,2,6,19), Emily: (0,3), ...}
winners = [k for k, v in d.iteritems() if s.intersection(v)]
I have:
([(5,2),(7,2)],[(5,1),(7,3),(11,1)])
I need to add the second elements having the same first element.
output:[(5,3),(7,5),(11,1)]
This is a great use-case for collections.Counter...
from collections import Counter
tup = ([(5,2),(7,2)], [(5,1),(7,3),(11,1)])
counts = sum((Counter(dict(sublist)) for sublist in tup), Counter())
result = list(counts.items())
print(result)
One downside here is that you'll lose the order of the inputs. They appear to be sorted by the key, so you could just sort the items:
result = sorted(counts.items())
A Counter is a dictionary whose purpose is to hold the "counts" of bins. Counts are cleverly designed so that you can simply add them together (which adds the counts "bin-wise" -- If a bin isn't present in both Counters, the missing bin's value is assumed to be 0). So, that explains why we can use sum on a bunch of counters to get a dictionary that has the values that you want. Unfortunately for this solution, a Counter can't be instantiated by using an iterable that yields 2-item sequences like normal mappings ...,
Counter([(1, 2), (3, 4)])
would create a Counter with keys (1, 2) and (3, 4) -- Both values would be 1. It does work as expected if you create it with a mapping however:
Counter(dict([(1, 2), (3, 4)]))
creates a Counter with keys 1 and 3 (and values 2 and 4).
Try this code: (Brute force, may be..)
dt = {}
tp = ([(5,2),(7,2)],[(5,1),(7,3),(11,1)])
for ls in tp:
for t in ls:
dt[t[0]] = dt[t[0]] + t[1] if t[0] in dt else t[1]
print dt.items()
The approach taken here is to loop through the list of tuples and store the tuple's data as a dictionary, wherein the 1st element in the tuple t[0] is the key and the 2nd element t[1] is the value.
Upon iteration, every time the same key is found in the tuple's 1st element, add the value with the tuple's 2nd element. In the end, we will have a dictionary dt with all the key, value pairs as required. Convert this dictionary to list of tuples dt.items() and we have our output.