I have the following dict:
{('I', 'like'):14, ('he','likes'):2, ('I', 'hate'):12}
For a given word string I want to get the second element of all tuples in dictionary (which is a key of a dictionary) that has this word as the first element.
I tried:
word='I'
second_word = (k[0][1] for k, v in d if word == k[0][0])
print(second_word)
and expected to get "like" as an answer but got:
<generator object generate_ngram_sentences.<locals>.<genexpr> at 0x7fed65bd0678>
<generator object generate_ngram_sentences.<locals>.<genexpr> at 0x7fed65bd0678>
<generator object generate_ngram_sentences.<locals>.<genexpr> at 0x7fed65bd06d0>
How to get not only first occurrence but all of such occurrences in dictionary?
EDIT:
2. Can you share how could it be modified in case the size of the tuple to be dynamic. So that the key of the dict would store eg. 2elem tuple or 15elem etc. tuple depending on dict?
You have the correct idea, but it needed to be fine tuned:
d = {('I', 'like'):14, ('he','likes'):2, ('I', 'hate'):12}
word='I'
second_word = [k[1] for k in d if k[0] == 'I']
print(second_word)
The output is a list of all second words for all the lkeys whose first item is 'I'
['like', 'hate']
from there:
second_word[0] --> 'like'
I steal the first sentence from Reblochon Masque's answer:
You have the correct idea, but it needed to be fine tuned:
second_word = (k[0][1] for k, v in d if word == k[0][0])
Iterating directly over d generates the keys only (which are what you are interested in, so this was the right idea).
Now, for k, v in d actually works, not because you get the key and the value, but because the key is a tuple and you unpack the two items in the tuple to the names k and v.
So k already is the first word and v is the second word, and you don't need to use any indexing like [0][0] or [0][1].
Using different names makes it clearer:
word = 'I'
second_words = (second for first, second in d if first == word)
Note that now second_words is a generator expression and not a list. If you simply go on iterating over second_words this is fine, but if you actually want the list, change the generator expression to a list comprehension by replacing the () by [].
Related
Let's assume that there is a dictionary list like this one:
lst = {(1,1):2, (1,2):5, (1,3):10, (1,4):14, (1,6):22}
I want a simple (the most efficient) function that returns the dictionary key which its value is the maximum.
For example:
key_for_max_value_in_dict(lst) = (1,6)
because the tuple (1,6) has the most value (22).
I came up with this code which might be the most efficient one:
max(lst, key=lambda x: lst[x])
Use a comprehension for that like:
Code:
max((v, k) for k, v in lst.items())[1]
How does it work?
Iterate over the items() in the dict, and emit them as tuples of (value, key) with the value first in the tuple. max() can then find the largest value, because tuples sort by each element in the tuple, with first element matching first element. Then take the second element ([1]) of the max tuple since it is the key value for the max value in the dict.
Test Code:
lst = {(1,1):2, (1,2):5, (1,3):10, (1,4):14, (1,6):22}
print(max((v, k) for k, v in lst.items())[1])
Results;
(1, 6)
Assuming you're using a regular unsorted dictionary, you'll need to walk down the entire thing once. Keep track of what the largest element is and update it if you see a larger one. If it is the same, add to the list.
largest_key = []
largest_value = 0
for key, value in lst.items():
if value > largest_value:
largest_value = value
largest_key = [key]
elif value == largest_value:
largest_key.append(key)
Description
I have two lists of lists which are derived from CSVs (minimal working example below). The real dataset for this too large to do this manually.
mainlist = [["MH75","QF12",0,38], ["JQ59","QR21",105,191], ["JQ61","SQ48",186,284], ["SQ84","QF36",0,123], ["GA55","VA63",80,245], ["MH98","CX12",171,263]]
replacelist = [["MH75","QF12","BA89","QR29"], ["QR21","JQ59","VA51","MH52"], ["GA55","VA63","MH19","CX84"], ["SQ84","QF36","SQ08","JQ65"], ["SQ48","JQ61","QF87","QF63"], ["MH98","CX12","GA34","GA60"]]
mainlist contains a pair of identifiers (mainlist[x][0], mainlist[x][1]) and these are associated with to two integers (mainlist[x][2] and mainlist[x][3]).
replacelist is a second list of lists which also contains the same pairs of identifiers (but not in the same order within a pair, or across rows). All sublist pairs are unique. Importantly, replacelist[x][2],replacelist[x][3] corresponds to a replacement for replacelist[x][0],replacelist[x][1], respectively.
I need to create a new third list, newlist which copies mainlist but replaces the identifiers with those from replacelist[x][2],replacelist[x][3]
For example, given:
mainlist[2] is: [JQ61,SQ48,186,284]
The matching pair in replacelist is
replacelist[4]: [SQ48,JQ61,QF87,QF63]
Therefore the expected output is
newlist[2] = [QF87,QF63,186,284]
More clearly put:
if replacelist = [[A, B, C, D]]
A is replaced with C, and B is replaced with D.
but it may appear in mainlist as [[B, A]]
Note newlist row position uses the same as mainlist
Attempt
What has me totally stumped on a simple problem is I feel I can't use basic list comprehension [i for i in replacelist if i in mainlist] as the order within a pair changes, and if I sorted(list) then I lose information about what to replace the lists with. Current solution (with commented blanks):
newlist = []
for k in replacelist:
for i in mainlist:
if k[0] and k[1] in i:
# retrieve mainlist order, then use some kind of indexing to check a series of nested if statements to work out positional replacement.
As you can see, this solution is clearly inefficient and I can't work out the best way to perform the final step in a few lines.
I can add more information if this is not clear
It'll help if you had replacelist as a dict:
mainlist = [[MH75,QF12,0,38], [JQ59,QR21,105,191], [JQ61,SQ48,186,284], [SQ84,QF36,0,123], [GA55,VA63,80,245], [MH98,CX12,171,263]]
replacelist = [[MH75,QF12,BA89,QR29], [QR21,JQ59,VA51,MH52], [GA55,VA63,MH19,CX84], [SQ84,QF36,SQ08,JQ65], [SQ48,JQ61,QF87,QF63], [MH98,CX12,GA34,GA60]]
replacements = {frozenset(r[:2]):dict(zip(r[:2], r[2:])) for r in replacements}
newlist = []
for *ids, val1, val2 in mainlist:
reps = replacements[frozenset([id1, id2])]
newlist.append([reps[ids[0]], reps[ids[1]], val1, val2])
First thing you do - transform both lists in a dictionary:
from collections import OrderedDict
maindct = OrderedDict((frozenset(item[:2]),item[2:]) for item in mainlist)
replacedct = {frozenset(item[:2]):item[2:] for item in replacementlist}
# Now it is trivial to create another dict with the desired output:
output_list = [replacedct[key] + maindct[key] for key in maindct]
The big deal here is that by using a dictionary, you cancel up the search time for the indices on the replacement list - in a list you have to scan all the list for each item you have, which makes your performance worse with the square of your list length. With Python dictionaries, the search time is constant - and do not depend on the data length at all.
I have a function h() that returns a tuple corresponding to the most common element in a list and its value from a dictionary called "Values" - so for example, if the most common element in list1 is a string "test" that occurs three times and that corresponds to Values = {"test":10}, then h(list1) = [3,10].
When two lists share the same element/frequency, I want to remove the most common element. Here is what I'm trying:
list1.remove([k for k,v in Values.items() if v == h(list1)[1]])
ValueError: list.remove(x): x not in list
How can I remove the key from a list based on its value in the Values dictionary?
Remove only expects a single element.
toremove = {k for k,v in Values.items() if v == h(list1)[1]]}
#either:
for r in toremove:
list1.remove(r)
#or (less efficient)
list1 = = [i for i in list1 if i not in toremove]
I was trying to use a list comprehension to replace multiple possible string values in a list of values.
I have a list of column names which are taken from a cursor.description;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
I then have header_replace;
{'MCB': 'SourceA', 'MCA': 'SourceB'}
I would like to replace the string values for header_replace.keys() found within the column names with the values.
I have had to use the following loop;
headers = []
for header in cursor.description:
replaced = False
for key in header_replace.keys():
if key in header[0]:
headers.append(str.replace(header[0], key, header_replace[key]))
replaced = True
break
if not replaced:
headers.append(header[0])
Which gives me the correct output;
['UNIX_Time', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
I tried using this list comprehension;
[str.replace(i[0],k,header_replace[k]) if k in i[0] else i[0] for k in header_replace.keys() for i in cursor.description]
But it meant that items were duplicated for the unmatched keys and I would get;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA',
'UNIX_Time', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB', 'col1_MCB', 'col2_MCB', 'col3_MCB']
But if instead I use;
[str.replace(i[0],k,header_replace[k]) for k in header_replace.keys() for i in cursor.description if k in i[0]]
#Bakuriu fixed syntax
I would get the correct replacement but then loose any items that didn't need to have an string replacement.
['col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
Is there a pythonesque way of doing this or am I over stretching list comprehensions? I certainly find them hard to read.
[str.replace(i[0],k,header_replace[k]) if k in i[0] for k in header_replace.keys() for i in cursor.description]
this is a SyntaxError, because if expressions must contain the else part. You probably meant:
[i[0].replace(k, header_replace[k]) for k in header_replace for i in cursor.description if k in i[0]]
With the if at the end. However I must say that list-comprehension with nested loops aren't usually the way to go.
I would use the expanded for loop. In fact I'd improve it removing the replaced flag:
headers = []
for header in cursor.description:
for key, repl in header_replace.items():
if key in header[0]:
headers.append(header[0].replace(key, repl))
break
else:
headers.append(header[0])
The else of the for loop is executed when no break is triggered during the iterations.
I don't understand why in your code you use str.replace(string, substring, replacement) instead of string.replace(substring, replacement). Strings have instance methods, so you them as such and not as if they were static methods of the class.
If your data is exactly as you described it, you don't need nested replacements and can boil it down to this line:
l = ['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
[i.replace('_MC', '_Source') for i in l]
>>> ['UNIX_Time',
>>> 'col1_SourceA',
>>> 'col2_SourceA',
>>> 'col3_SourceA',
>>> 'col1_SourceB',
>>> 'col2_SourceB',
>>> 'col3_SourceB']
I guess a function will be more readable:
def repl(key):
for k, v in header_replace.items():
if k in key:
return key.replace(k, v)
return key
print map(repl, names)
Another (less readable) option:
import re
rx = '|'.join(header_replace)
print [re.sub(rx, lambda m: header_replace[m.group(0)], name) for name in names]
I wrote the below code working with dictionary and list:
d = computeRanks() # dictionary of id : interestRank pairs
lst = list(d) # tuples (id, interestRank)
interestingIds = []
for i in range(20): # choice randomly 20 highly ranked ids
choice = randomWeightedChoice(d.values()) # returns random index from list
interestingIds.append(lst[choice][0])
There seems to be possible error because I'm not sure if there is a correspondence between indices in lst and d.values().
Do you know how to write this better?
One of the policies of dict is that the results of dict.keys() and dict.values() will correspond so long as the contents of the dictionary are not modified.
As #Ignacio says, the index choice does correspond to the intended element of lst, so your code's logic is correct. But your code should be much simpler: d already contains IDs for the elements, so rewrite randomWeightedChoice to take a dictionary and return an ID.
Perhaps it will help you to know that you can iterate over a dictionary's key-value pairs with d.items():
for k, v in d.items():
etc.