dlist=['All my loving','All my bros','And all sis']
I would like to create a dictionary such that all words (as keys) are assigned a value which is index of dlist in which the words appear.
For example,
'All':{0,1}, 'my':{0,1},'sis'={2} etc.
Somehow this does not work:
dict={}
{w:{num} if w not in dict.keys() else dict[w].add(num) for (num,strn) in enumerate(dlist) for w in strn.split()}
This returns
{'All':{2}, 'my':{2}}
Looks like else statement is being ignored. Any pointers?
Thanks
This doesn't work because you are trying to access dict.keys while you are creating dict in a dict comprehension. If this was in a for loop, dict.keys would be updated each element, but the dict comprehensions ensures that the dict is not updated mid-creation to improve speed.
Something like this should work:
myDict = {}
for (num, strn) in enumerate(dlist):
for w in strn.split():
if w not in myDict:
myDict[w] = {num}
else:
myDict[w].add(num)
Related
I have a dictionary where the values are a list of tuples.
dictionary = {1:[('hello, how are you'),('how is the weather'),('okay
then')], 2:[('is this okay'),('maybe It is')]}
I want to make the values a single string for each key. So I made a function which does the job, but I do not know how to get insert it back to the original dictionary.
my function:
def list_of_tuples_to_string(dictionary):
for tup in dictionary.values():
k = [''.join(i) for i in tup] #joining list of tuples to make a list of strings
l = [''.join(k)] #joining list of strings to make a string
for j in l:
ki = j.lower() #converting string to lower case
return ki
output i want:
dictionary = {1:'hello, how are you how is the weather okay then', 2:'is this okay maybe it is'}
You can simply overwrite the values for each key in the dictionary:
for key, value in dictionary.items():
dictionary[key] = ' '.join(value)
Note the space in the join statement, which joins each string in the list with a space.
It can be done even simpler than you think, just using comprehension dicts
>>> dictionary = {1:[('hello, how are you'),('how is the weather'),('okay then')],
2:[('is this okay'),('maybe It is')]}
>>> dictionary = {key:' '.join(val).lower() for key, val in dictionary.items()}
>>> print(dictionary)
{1: 'hello, how are you how is the weather okay then', 2: 'is this okay maybe It is'}
Now, let's go through the method
we loop through the keys and values in the dictionary with dict.items()
assign the key as itself together with the value as a string consisting of each element in the list.
The elemts are joined together with a single space and set to lowercase.
Try:
for i in dictionary.keys():
dictionary[i]=' '.join(updt_key.lower() for updt_key in dictionary[i])
I often used collections.defaultdict to be able to append an element to d[key] without having to initialize it first to [] (benefit: you don't need to do: if key not in d: d[key] = []):
import collections, random
d = collections.defaultdict(list)
for i in range(100):
j = random.randint(0,20)
d[j].append(i) # if d[j] does not exist yet, initialize it to [], so we can use append directly
Now I realize we can simply use a normal dict and setdefault:
import random
d = {}
for i in range(100):
j = random.randint(0,20)
d.setdefault(j, []).append(i)
Question: when using a dict whose values are lists, is there a good reason to use a collections.defaultdict instead of the second method (using a simple dict and setdefault), or are they purely equivalent?
collections.defaultdict is generally more performant, it is optimised exactly for this task and C-implemented. However, you should use dict.setdefault if you want accessing an absent key in your resulting dictionary to result in a KeyError rather than inserting an empty list. This is the most important practical difference.
In addition to the answer by Chris_Rands, I want to further emphasize that a primary reason to use defaultdict is if you want key accesses to always succeed, and to insert the default value if there was none.
This can be for any reason, and a completely valid one is the convenience of being able to use [] instead of having to call dict.setdefault before every access.
Also note that key in default_dict will still return False if that key has never been accessed before, so you can still check for existence of keys in a defaultdict if necessary. This allows appending to the lists without checking for their existence, but also checking for the existence of the lists if necessary.
When using defaultdict you have a possibility to do inplace addition:
import collections, random
d = collections.defaultdict(list)
for i in range(100):
j = random.randint(0,20)
d[j] += [i]
There is no equivalent construction like d.setdefault(j, []) += [i], it gives SyntaxError: cannot assign to function call.
I was trying to use a list comprehension to replace multiple possible string values in a list of values.
I have a list of column names which are taken from a cursor.description;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
I then have header_replace;
{'MCB': 'SourceA', 'MCA': 'SourceB'}
I would like to replace the string values for header_replace.keys() found within the column names with the values.
I have had to use the following loop;
headers = []
for header in cursor.description:
replaced = False
for key in header_replace.keys():
if key in header[0]:
headers.append(str.replace(header[0], key, header_replace[key]))
replaced = True
break
if not replaced:
headers.append(header[0])
Which gives me the correct output;
['UNIX_Time', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
I tried using this list comprehension;
[str.replace(i[0],k,header_replace[k]) if k in i[0] else i[0] for k in header_replace.keys() for i in cursor.description]
But it meant that items were duplicated for the unmatched keys and I would get;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA',
'UNIX_Time', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB', 'col1_MCB', 'col2_MCB', 'col3_MCB']
But if instead I use;
[str.replace(i[0],k,header_replace[k]) for k in header_replace.keys() for i in cursor.description if k in i[0]]
#Bakuriu fixed syntax
I would get the correct replacement but then loose any items that didn't need to have an string replacement.
['col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
Is there a pythonesque way of doing this or am I over stretching list comprehensions? I certainly find them hard to read.
[str.replace(i[0],k,header_replace[k]) if k in i[0] for k in header_replace.keys() for i in cursor.description]
this is a SyntaxError, because if expressions must contain the else part. You probably meant:
[i[0].replace(k, header_replace[k]) for k in header_replace for i in cursor.description if k in i[0]]
With the if at the end. However I must say that list-comprehension with nested loops aren't usually the way to go.
I would use the expanded for loop. In fact I'd improve it removing the replaced flag:
headers = []
for header in cursor.description:
for key, repl in header_replace.items():
if key in header[0]:
headers.append(header[0].replace(key, repl))
break
else:
headers.append(header[0])
The else of the for loop is executed when no break is triggered during the iterations.
I don't understand why in your code you use str.replace(string, substring, replacement) instead of string.replace(substring, replacement). Strings have instance methods, so you them as such and not as if they were static methods of the class.
If your data is exactly as you described it, you don't need nested replacements and can boil it down to this line:
l = ['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
[i.replace('_MC', '_Source') for i in l]
>>> ['UNIX_Time',
>>> 'col1_SourceA',
>>> 'col2_SourceA',
>>> 'col3_SourceA',
>>> 'col1_SourceB',
>>> 'col2_SourceB',
>>> 'col3_SourceB']
I guess a function will be more readable:
def repl(key):
for k, v in header_replace.items():
if k in key:
return key.replace(k, v)
return key
print map(repl, names)
Another (less readable) option:
import re
rx = '|'.join(header_replace)
print [re.sub(rx, lambda m: header_replace[m.group(0)], name) for name in names]
I have a list in the following format:
['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d',
'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
I want to create a new list which looks like like this:
['CASE_1:a,b,c,d','CASE_2:e,f,g,h']
Any idea how to get this done elegantly??
You can use a defaultdict by treating case as the key, and appending to the list each letter, where case and the letter are obtained by splitting the elements of your list on ':' - such as:
from collections import defaultdict
case_letters = defaultdict(list)
start = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
for el in start:
case, letter = el.split(':')
case_letters[case].append(letter)
result = sorted('{case}:{letters}'.format(case=key, letters=','.join(values)) for key, values in case_letters.iteritems())
print result
As this is homework (edit: or was!!?) - I recommend looking at collections.defaultdict, str.split (and other builtin string methods), at the builtin type list and it's methods (such as append, extend, sort etc...), str.format, the builtin sorted method and generally a dict in general. Use the working example here along with the final manual for reference - all these things will come in handy later on - so it's in your best interest to understand them as best you can.
One other thing to consider is that having something like:
{1: ['a', 'b', 'c', 'd'], 2: ['e', 'f', 'g', 'h']}
is a lot more of a useful format and could be used to recreate your desired list afterwards anyway...
I've deleted my full solution since I realized this is homework, but here's the basic idea:
A dictionary is a better data structure. I would look at a collections.defaultdict. e.g.
yourdict = defaultdict(list)
You can iterate through your list (splitting each element on ':'). Something like:
#only split string once -- resulting in a list of length 2.
case, value = element.split(':',1)
Then you can add these to the dict using the list .append method:
yourdict[case].append(value)
Now, you'll have a dict which maps keys (Case_1, Case_2) to lists (['a','b','c','d'], [...]).
If you really need a list, you can sort the items of the dictionary and join appropriately.
sigh. It looks like the homework tag has been removed (here's my original solution):
from collections import defaultdict
d = defaultdict(list)
for elem in yourlist:
case, value = elem.split(':', 1)
d[case].append(value)
Now you have a dictionary as I described above. If you really want to get your list back:
new_lst = [ case+':'+','.join(values) for case,values in sorted(d.items()) ]
data = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
output = {}
for item in data:
key, value = item.split(':')
if key not in output:
output[key] = []
output[key].append(value)
result = []
for key, values in output.items():
result.append('%s:%s' % (key, ",".join(values)))
print result
outputs
['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']
mydict = {}
for item in list:
key,value = item.split(":")
if key in mydict:
mydict[key].append(value)
else:
mydict[key] = [value]
[key + ":" + ",".join(value) for key, value in mydict.iteritems()]
Not much elegance, to be honest. You know, I'd store your list as a dict, cause it behaves as a dict in fact.
output is ['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']
I wrote the below code working with dictionary and list:
d = computeRanks() # dictionary of id : interestRank pairs
lst = list(d) # tuples (id, interestRank)
interestingIds = []
for i in range(20): # choice randomly 20 highly ranked ids
choice = randomWeightedChoice(d.values()) # returns random index from list
interestingIds.append(lst[choice][0])
There seems to be possible error because I'm not sure if there is a correspondence between indices in lst and d.values().
Do you know how to write this better?
One of the policies of dict is that the results of dict.keys() and dict.values() will correspond so long as the contents of the dictionary are not modified.
As #Ignacio says, the index choice does correspond to the intended element of lst, so your code's logic is correct. But your code should be much simpler: d already contains IDs for the elements, so rewrite randomWeightedChoice to take a dictionary and return an ID.
Perhaps it will help you to know that you can iterate over a dictionary's key-value pairs with d.items():
for k, v in d.items():
etc.