Removing Entries From Dictionaries - python

Say I have a dictionary with a lot of entries but I only need the first 5-10 entries to be printed, how would I go about doing this? I thought about using a for loop but I cannot find a way to make that work with dictionaries since as far as I am aware you cannot access dictionary values without knowing the key names. I also tried converting the dictionary into a list of tuples but this causes the order of the entries to be changed in an unwanted way. Any tips?

For dictionary d, to print the first n values:
print(list(d.values())[:n])
If the dictionary represent counts of words and you want the list of the top n words:
d = {'red': 4, 'blue': 2, 'yellow': 1, "green":5} # Example dictionary
sorted_d = sorted(d.items(), key = lambda kv: -kv[1]) # Sort descending
n = 2 # number of values wanted
print(sorted_d[:n]) # list of top n tuples
# Out: [('green', 5), ('red', 4)]
You can get words and counts as separate list
words, counts = zip(*sorted_d) # split into words and values
print(counts[:n]) # counts of top n words
# Out: (5, 4) # top n values
Another option is to convert the dictionary to a Counter
from collections import Counter
c = Counter(d)
print(c.most_common(n)) # Shows the n most common items in dictionary
# Out: {'green': 5, 'red': 4}
If using the counter, you could also use the counter to count the words as explained by Counting words with Python's Counter

If asd is your dictionary
asd = {'a':'1', 'b':'2', 'c':'3'}
for i, el in enumerate(asd.values()):
if i < 5:
print(el)
This will print the first 5 values, regardless of the name of the keys.

Related

Dictionary comprehension Bigram

I have a python assignment to extract bigrams from a string into a dictionary and I think I have found the solution online but cant remember where I found it. But it seems to work but I am having trouble understanding it as I am new to python. Can anyone explain the code below which takes a string and extracts chars into tuples and counts instances and puts it into a dictionary
'''
s = 'Mississippi' # Your string
# Dictionary comprehension
dic_ = {k : s.count(k) for k in{s[i]+s[i+1] for i in range(len(s)-1)}}
'''
First let's understand comprehensions:
A list, dict, set, etc. can be made with a comprehension. Basically a comprehension is taking a generator and using it to form a new variable. A generator is just an object that returns a different value each iteration so to use list as an example: to make a list with a list comprehension we take the values that the generator outputs and put them into their own spot in a list. Take this generator for example:
x for x in range(0, 10)
This will just give 0 on the first iteration, then 1, then 2, etc. so to make this a list we would use [] (list brakets) like so:
[x for x in range(0, 10)]
This would give:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] #note: range does not include the second input
for a dictionary and for a set we use {}, but since dictionaries uses key-value pairs our generator will be different for sets and dictionaries. For a set it is the same as a list:
{x for x in range(0, 10)} #gives the set --> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
but for a dictionary we need a key and a value. Since enumerate gives two items this could be useful for dictionaries in some cases:
{key: value for key, value in enumerate([1,2,3])}
In this case the keys are the indexes and the values are the items in the list. So this gives:
{0: 1, 1: 2, 2: 3} #dictionary
It doesn't make a set because we denote x : y which is the format for items in a dictionary, not a set.
Now, let's break this down:
This part of the code:
{s[i]+s[i+1] for i in range(len(s)-1)}
is making a set of values that is every pair of touching letters, s[i] is one letter, s[i+1] is the letter after, so it is saying get this pair (s[i]+s[i+1]) and do it for every item in the string (for i in range(len(s)-1) Notice there is a -1 since the last letter does not have a touching letter after it (so we don't want to run it for the last letter).
Now that we have a set let's save it to a variable so it's easier to see:
setOfPairs = {s[i]+s[i+1] for i in range(len(s)-1)}
Then our original comprehension would change to:
{k : s.count(k) for k in setOfPairs}
This is saying we want to make a dictionary that has keys of k and values of s.count(k) since we get every k from our pairs list: for k in setOfPairs the keys of the dictionary are, then, the pairs. Since s.count(k) returns the number of times k is in s, the values of the dictionary are the number of times the key appears in s.
Let's take this apart one step at a time:
s[i] is the code to select the i-th letter in the string s.
s[i]+s[i+1] concatenates the letter at position i and the letter at position i+1.
s[i]+s[i+1] for i in range(len(s)-1) iterates each index i (except the last one) and so computes all the bigrams.
Since the expression in 3 is surrounded by curly brackets, the result is a set, meaning that all duplicate bigrams are removed.
for k in {s[i]+s[i+1] for i in range(len(s)-1)} therefore iterates over all unique bigrams in the given string s.
Lastly, {k : s.count(k) for k in{s[i]+s[i+1] for i in range(len(s)-1)}} maps each each bigram k to the amount of times it appears in s, because the str.count function returns the number of times a substring appears in a string.
I hope that helps. If you want to know more about list/set/dict comprehensions in Python, the relevant entry in the Python documentation is here: https://docs.python.org/3/tutorial/datastructures.html?highlight=comprehension#list-comprehensions
dic_ = {k : s.count(k) for k in{s[i]+s[i+1] for i in range(len(s)-1)}}
Read backwards
dic_ = {k : s.count(k)
## Step 3 with each of the pair of characters
count how many are in the string
store the 2 char string as the key and the count as the value for the dictionary.
for k in{s[i]+s[i+1]
# Step 2 From each of the position take 2 characters out of the string
for i in range(len(s)-1)}}
# Step 1 loop over all but the last character of the string.
The code may be inefficient for long strings with many repetitions. Step 2 takes every pair so the count and store will be repeated count times.
Refactoring so you can test if the key already exists and not repeating the count may speed it up. bench mark time on ... say a billion base pair DNA sequence.

Fill list using Counter with 0 values

Is possible to have a count of how many times a value appears in a given list, and have '0' if the item is not in the list?
I have to use zip but the first list have 5 items and the other one created using count, have only 3. That's why I need to fill the other two position with 0 values.
You can achieve your purpose with itertools zip_longest.
With zip_longest, you can zip two lists of different lengths, just that the missing corresponding values will be filled with 'None'. You may define a suitable fill values as i have done below.
from itertools import zip_longest
a = ['a','b','c','d','e']
b = [1,4,3]
final_lst = list(zip_longest(a,b, fillvalue=0))
final_dict = dict(list(zip_longest(a,b, fillvalue=0))) #you may convert answer to dictionary if you wish
ELSE
If what you are trying to do is count the number of times items in a reference list appear in another list(taking record also of reference items that don't appear in the other list), you may use dictionary comprehension:
ref_list = ['a','b','c','d','e']#reference list
other_list = ['a','b','b','d','a','d','a','a','a']
count_dict = {n:other_list.count(n) for n in ref_list}
print (count_dict)
Output
{'a': 5, 'b': 2, 'c': 0, 'd': 2, 'e': 0}
Use collections.Counter, and then call get with a default value of 0 to see how many times any given element appears:
>>> from collections import Counter
>>> counts = Counter([1, 2, 3, 1])
>>> counts.get(1, 0)
2
>>> counts.get(2, 0)
1
>>> counts.get(5, 0)
0
If you want to count how many times a value appears in a list, you could do this:
def count_in_list(list_,value):
count=0
for e in list_:
if e==value:
count+=1
return count
And use the code like this:
MyList=[1,3,1,1,1,1,1,2]
count_in_list(MyList,1)
Output:
6
This will work without any additional things such as imports.

Python: Compare Lists

I have two lists a and b:
a = ['146769015', '163081689', '172235774', ...]
b = [['StackOverflow (146769015)'], ['StackOverflow (146769015)'], ['StackOverflow (163081689)'], ...]
What I'm trying to do is to check if the elements of list a are in list b, and if they are, how many times they appear.
In this case the output should be:
'146769015':2
'163081689':1
I've already tried the set() function but that does not seem to work
print(set(a)&set(b))
And i get this
print(set(a)&set(b))
TypeError: unhashable type: 'list'
Is it possible to do what i want?
Thank you all.
When you perform set(a) & set(b), you're trying to see which elements both lists share. There are a couple errors in your logic.
First, your first list is comprised of strings. Your second list is comprised of lists.
Second, the elements of your second list are never the same than your first list, because the first has only numbers, and the second has numbers and letters.
Third, even if you only extract the numbers, the intersection of both sets will bring which numbers are on both sets, but not how many times.
A good approach might be to extract the numbers in your second list and then count occurrences if they are present in list a:
from collections import Counter
import re
a=['146769015', '163081689', '172235774']
b=[['StackOverflow (146769015)'],['StackOverflow (146769015)'],['StackOverflow (163081689)']]
numbs = [re.search('\d+', elem[0]).group(0) for elem in b]
cnt = Counter()
for n in numbs:
if n in a:
cnt[n]+= 1
Output:
Counter({'146769015': 2, '163081689': 1})
I'll leave as homework to you to research what are dictionaries and Counters.
It's tricky when you have a string as a subset of strings, otherwise I think you could use a Counter from collections and iterate that using a as a key.
Otherwise you can flatten the list and nested loop through it.
from collections import defaultdict
flat_list = [item for sublist in b for item in sublist]
c = defaultdict(lambda: 0)
for string in a:
for string2 in flat_list:
if string in string2:
c[string] += 1
You can use a dictionary:
a=['146769015', '163081689', '172235774']
b=[['StackOverflow (146769015)'],['StackOverflow (146769015)'],['StackOverflow (163081689)']]
c = {}
for s in a:
for d in b:
for i in d:
if s in i:
if s not in c:
c[s] = 1
else:
c[s] += 1
print(c)
Output:
{'146769015': 2, '163081689': 1}

How to get dictionary key whose value contains at least one item in another list?

I have written a simple script the scope of which is:
list=[1,19,46,28 etc...]
dictionary={Joey:(10,2,6,19), Emily: (0,3), etc}
Now I need to find all the keys of the dictionary which have at least one of the list entries in the values
Example: 19 is in Joeys values, so Joey is the winner.
How I did it: (no programmer at all)
# NodesOfSet = the list
# elementsAndTheirNodes = the dictionary
# loop as many times as the number of key:value entries in the dictionary element:nodes
# simply: loop over all the elements
for i in range (0, len (elementsAndTheirNodes.keys())):
# there is an indent here (otherwise it wouldnt work anyway)
# loop over the tuple that serves as the value for each key for a given i-th key:value
# simply: loop over all their nodes
for j in range (0, len (elementsAndTheirNodes.values()[i])):
# test: this prints out element + 1 node and so on
# print (elementsAndTheirNodes.keys()[i], elementsAndTheirNodes.values()[i][j] )
for k in range (0, len (NodesOfSet)):
if NodesOfSet[k] == (elementsAndTheirNodes.values()[i][j]):
print ( elementsAndTheirNodes.keys()[i], " is the victim")
else:
print ( elementsAndTheirNodes.keys()[i], " is not the victim")
But this is very time consuming as it iterates over basically everything in the database. May I ask for a help optimizing this? Thanks!
I would use a list comprehension and the builtin any which shortcircuits once a shared item is found. Turning your list into a set reduces the complexity of the membership lookup from O(n) to O(1):
s = set(lst)
result = [k for k, v in dct.items() if any(i in s for i in v)]
Be careful to not assign builtins as the names for your objects (e.g. list) to avoid making the builtin unusable later on in your code.
Don't use the name list, list is the name of a library function.
l = [1, 19, 46, 28, ...]
l_set = set(l)
d = {'Joey':(10,2,6,19), 'Emily': (0,3), ...}
winners = [k for k, v in d.items() if any(i in l_set for i in v)]
any will stop iterating through v as soon as it "sees" a shared value, saving some time.
You could also use set intersection to check if any of the elements in the dictionary value tuples have anything in common with your "list" entries:
l = [1,19,46,28, ...]
s = set(l)
d = {Joey:(10,2,6,19), Emily: (0,3), ...}
winners = [k for k, v in d.iteritems() if s.intersection(v)]

How to sort a dictionary in a list using comparison to another list

I'm working on a bigger program and need to sort a list of dictionaries. I'm using the integer values from a separate list and comparing them to the values associated with a specific key in each dictionary. I'm getting messed up results with this, and I have no idea why... help!
My code:
new_list_of_dictionaries = []
b = [{1: 'one', 'sam': '2,300'}, {3: 'thee', 'sam': '4,000'}]
list_of integers = [2300, 2300]
for i in list_sof_integers:
for a_dictionary in b:
r = a_dictionary["sam"].replace(',','')
#print r
#r2 = r.replace(',','')
#print r2
if i == int(r):
new_list_of_dictionaries.append(a_dictionary)
print new_list_of_dictionaries
It looks like you are filtering the list of dictionaries based on the numbers in the separate list. You can use list comprehension, with the filtering condition like this
[c_dict for c_dict in b if int(c_dict["sam"].replace(',','')) in list_of_ints]
The in operator will look for the integer value, in the list_of_ints. If you want to speed it up, you can convert the list_of_ints to a set, like this
set_of_ints = set(list_of_ints)
[c_dict for c_dict in b if int(c_dict["sam"].replace(',','')) in set_of_ints]

Categories

Resources