def big(dict, n):
line = []
for k in dict:
if k > n:
line.append(k)
return line
I have to find all the elements in dict larger than n.
However, my code only returns the largest number in dict larger than n.
What do I need to do in order to make it correct?
The return line is tabbed too far over, so it returns when the first key larger than n is found (Note: a dictionary isn't ordered by the way you write it), rather than going over all keys before returning. Try:
def big(dic, n):
line = []
for k in dic:
if k > n:
line.append(k)
return line
In fact, you might prefer it to use list comprehension (and the function becomes just one line).
def big(dic, n):
return [k for k in dic if k>n]
.
Dictionaries compomise of key value pairs, {key: value} and when we iterate over a dictionary we are iterating over it's keys. This explains the use of the variable k to iterate over the keys. That is,
[k for k in dic] = [key1, key2, ...]
Hence, if you want to find that with the largest value in the dictionary, you can use:
return [dic[k] for k in dic if dic[k]>n]
Note: I've changed the variable name to dic since (as #AndrewJaffe mentions) dict is a built-in object, and renaming it here may cause unexpected things to occur, and is generally considered bad practise. For example, if you wanted to check type(dic)==dict.
Naively iterating over a dictionary gives you a sequence of keys. not values.
So to do what you want, you need itervalues:
for k in d.itervalues(): ### call it "d" rather than "dict"
if k>n:
line.append(k)
Or, as others have pointed out, use a list comprehension.
Also, don't use dict for the name, as it shadows a builtin.
def big(dic, n):
line = []
for k in dic:
if dic[k]> n: #compare value instead of key
line.append(k) #use k if you're appending key else dic[k] for val
return line
output:
>>> print big({'a':10,'b':15, 'c':12},11)
['c']
move the return statement backwards two tabs otherwise it will return on the first value larger than n.
Related
I often used collections.defaultdict to be able to append an element to d[key] without having to initialize it first to [] (benefit: you don't need to do: if key not in d: d[key] = []):
import collections, random
d = collections.defaultdict(list)
for i in range(100):
j = random.randint(0,20)
d[j].append(i) # if d[j] does not exist yet, initialize it to [], so we can use append directly
Now I realize we can simply use a normal dict and setdefault:
import random
d = {}
for i in range(100):
j = random.randint(0,20)
d.setdefault(j, []).append(i)
Question: when using a dict whose values are lists, is there a good reason to use a collections.defaultdict instead of the second method (using a simple dict and setdefault), or are they purely equivalent?
collections.defaultdict is generally more performant, it is optimised exactly for this task and C-implemented. However, you should use dict.setdefault if you want accessing an absent key in your resulting dictionary to result in a KeyError rather than inserting an empty list. This is the most important practical difference.
In addition to the answer by Chris_Rands, I want to further emphasize that a primary reason to use defaultdict is if you want key accesses to always succeed, and to insert the default value if there was none.
This can be for any reason, and a completely valid one is the convenience of being able to use [] instead of having to call dict.setdefault before every access.
Also note that key in default_dict will still return False if that key has never been accessed before, so you can still check for existence of keys in a defaultdict if necessary. This allows appending to the lists without checking for their existence, but also checking for the existence of the lists if necessary.
When using defaultdict you have a possibility to do inplace addition:
import collections, random
d = collections.defaultdict(list)
for i in range(100):
j = random.randint(0,20)
d[j] += [i]
There is no equivalent construction like d.setdefault(j, []) += [i], it gives SyntaxError: cannot assign to function call.
why this code isn't working? trying to get returns on items which value==key
L=[0,2,2,1,5,5,6,10]
x=dict(enumerate(L))
y=(filter(x.keys()==x.values(), x.items()))
print(list(y))
The keys() method returns a view of all of the keys.
The values() method returns a view of all of the values.
So, x.keys()==x.values() is asking whether all of the keys equal all of the values, which is of course not true.
Also, filter wants a function. But you're not passing it a function, you're just passing it the result of x.keys()==x.values(), or False. To turn that into a function, you'd need to use def or lambda to create a new function.
The function you want to create is a function that takes an item, and returns true if the key equals the value. Since an item is just a 2-element tuple with the key and value for that item, the function to check that is:
y = filter((lambda item: item[0] == item[1]), x.items())
Or, if that's a bit too confusing, don't try to write it inline; just def it separately:
def key_equals_value(item):
key, value = item
return key == value
y = filter(key_equals_value, x.items())
However, this is pretty clumsy; it's much easier to write it as a comprehension than a filter call:
y = ((key, value) for (key, value) in x.items() if key == value)
As a general rule, whenever you don't already have a function to pass to filter or map, and would have to create one with def or lambda, a comprehension will usually be more readable, because you can just write the expression directly.
And, if you want a list rather than a generator, you can do that with a comprehension just by changing the parens to square brackets:
y = [(key, value) for (key, value) in x.items() if key == value]
And, if you want just the values, not the key-value pairs:
y = [value for (key, value) in x.items() if key == value]
If you find yourself confused by comprehensions, they can always be converted into nested statements, with an append at the bottom. So, that last one is equivalent to:
y = []
for key, value in x.items():
if key == value:
y.append(value)
Also, you don't really need a dict here in the first place; you just want to iterate over the index, value pairs. So:
y = [value for (index, value) in enumerate(L) if index == value]
I have written a simple script the scope of which is:
list=[1,19,46,28 etc...]
dictionary={Joey:(10,2,6,19), Emily: (0,3), etc}
Now I need to find all the keys of the dictionary which have at least one of the list entries in the values
Example: 19 is in Joeys values, so Joey is the winner.
How I did it: (no programmer at all)
# NodesOfSet = the list
# elementsAndTheirNodes = the dictionary
# loop as many times as the number of key:value entries in the dictionary element:nodes
# simply: loop over all the elements
for i in range (0, len (elementsAndTheirNodes.keys())):
# there is an indent here (otherwise it wouldnt work anyway)
# loop over the tuple that serves as the value for each key for a given i-th key:value
# simply: loop over all their nodes
for j in range (0, len (elementsAndTheirNodes.values()[i])):
# test: this prints out element + 1 node and so on
# print (elementsAndTheirNodes.keys()[i], elementsAndTheirNodes.values()[i][j] )
for k in range (0, len (NodesOfSet)):
if NodesOfSet[k] == (elementsAndTheirNodes.values()[i][j]):
print ( elementsAndTheirNodes.keys()[i], " is the victim")
else:
print ( elementsAndTheirNodes.keys()[i], " is not the victim")
But this is very time consuming as it iterates over basically everything in the database. May I ask for a help optimizing this? Thanks!
I would use a list comprehension and the builtin any which shortcircuits once a shared item is found. Turning your list into a set reduces the complexity of the membership lookup from O(n) to O(1):
s = set(lst)
result = [k for k, v in dct.items() if any(i in s for i in v)]
Be careful to not assign builtins as the names for your objects (e.g. list) to avoid making the builtin unusable later on in your code.
Don't use the name list, list is the name of a library function.
l = [1, 19, 46, 28, ...]
l_set = set(l)
d = {'Joey':(10,2,6,19), 'Emily': (0,3), ...}
winners = [k for k, v in d.items() if any(i in l_set for i in v)]
any will stop iterating through v as soon as it "sees" a shared value, saving some time.
You could also use set intersection to check if any of the elements in the dictionary value tuples have anything in common with your "list" entries:
l = [1,19,46,28, ...]
s = set(l)
d = {Joey:(10,2,6,19), Emily: (0,3), ...}
winners = [k for k, v in d.iteritems() if s.intersection(v)]
I was trying to use a list comprehension to replace multiple possible string values in a list of values.
I have a list of column names which are taken from a cursor.description;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
I then have header_replace;
{'MCB': 'SourceA', 'MCA': 'SourceB'}
I would like to replace the string values for header_replace.keys() found within the column names with the values.
I have had to use the following loop;
headers = []
for header in cursor.description:
replaced = False
for key in header_replace.keys():
if key in header[0]:
headers.append(str.replace(header[0], key, header_replace[key]))
replaced = True
break
if not replaced:
headers.append(header[0])
Which gives me the correct output;
['UNIX_Time', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
I tried using this list comprehension;
[str.replace(i[0],k,header_replace[k]) if k in i[0] else i[0] for k in header_replace.keys() for i in cursor.description]
But it meant that items were duplicated for the unmatched keys and I would get;
['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_SourceA', 'col2_SourceA', 'col3_SourceA',
'UNIX_Time', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB', 'col1_MCB', 'col2_MCB', 'col3_MCB']
But if instead I use;
[str.replace(i[0],k,header_replace[k]) for k in header_replace.keys() for i in cursor.description if k in i[0]]
#Bakuriu fixed syntax
I would get the correct replacement but then loose any items that didn't need to have an string replacement.
['col1_SourceA', 'col2_SourceA', 'col3_SourceA', 'col1_SourceB', 'col2_SourceB', 'col3_SourceB']
Is there a pythonesque way of doing this or am I over stretching list comprehensions? I certainly find them hard to read.
[str.replace(i[0],k,header_replace[k]) if k in i[0] for k in header_replace.keys() for i in cursor.description]
this is a SyntaxError, because if expressions must contain the else part. You probably meant:
[i[0].replace(k, header_replace[k]) for k in header_replace for i in cursor.description if k in i[0]]
With the if at the end. However I must say that list-comprehension with nested loops aren't usually the way to go.
I would use the expanded for loop. In fact I'd improve it removing the replaced flag:
headers = []
for header in cursor.description:
for key, repl in header_replace.items():
if key in header[0]:
headers.append(header[0].replace(key, repl))
break
else:
headers.append(header[0])
The else of the for loop is executed when no break is triggered during the iterations.
I don't understand why in your code you use str.replace(string, substring, replacement) instead of string.replace(substring, replacement). Strings have instance methods, so you them as such and not as if they were static methods of the class.
If your data is exactly as you described it, you don't need nested replacements and can boil it down to this line:
l = ['UNIX_Time', 'col1_MCA', 'col2_MCA', 'col3_MCA', 'col1_MCB', 'col2_MCB', 'col3_MCB']
[i.replace('_MC', '_Source') for i in l]
>>> ['UNIX_Time',
>>> 'col1_SourceA',
>>> 'col2_SourceA',
>>> 'col3_SourceA',
>>> 'col1_SourceB',
>>> 'col2_SourceB',
>>> 'col3_SourceB']
I guess a function will be more readable:
def repl(key):
for k, v in header_replace.items():
if k in key:
return key.replace(k, v)
return key
print map(repl, names)
Another (less readable) option:
import re
rx = '|'.join(header_replace)
print [re.sub(rx, lambda m: header_replace[m.group(0)], name) for name in names]
I have a dict like this:
d = {'a':'b+c', 'b':'f+g', 'f':'y+u'}
I want to recursively replace the letters in the values that are also keys, so I end up with:
d = {'a':'y+u+g+c', 'b':'y+u+g', 'f':'y+u'}
I tried using this code:
def getval(key,d):
if d.has_key(key):
temp=re.findall('\w+',d[key])
for i in range(len(temp)):
if d.has_key(temp[i]):
getval(temp[i],d)
else:
continue
for k,v in d.iteritems():
temp=re.findall('\w+',d[k])
for i in range(len(temp)):
if d.has_key(temp[i]):
getval(temp[i],d)
But it doesn't work. How can I do it? My real dictionary is much larger, but definitely doesn't contain any cycles.
I am actually not sure recursion is the most appropriate method here, here is a solution that makes replacements in a loop until none of the replacements change the current value:
import re
def make_replacements(d):
r = d.copy()
regex = dict((k, re.compile(r'\b' + re.escape(k) + r'\b')) for k in r)
for k in r:
done = False
while not done:
done = True
for k2 in r:
n = regex[k2].sub(r[k2], r[k])
if n != r[k]:
r[k] = n
done = False
return r
print make_replacements({'a': 'b+c', 'b': 'f+g', 'f': 'y+u'})
# {'a': 'y+u+g+c', 'b': 'y+u+g', 'f': 'y+u'}
Note that this doesn't detect any loops in the input, so if you give it something like {'a':'b+c','b':'c+a','c':'a+b'} it will enter an infinite loop (although it sounds like this should never happen from your comment).
The problem with iterative methods like that is that their runtime is highly sensitive to the depth of the nesting and the order of the items in the dict. This recursive version runs in linear time with the total number of "segments" in the resulting dict, where a segment is each piece of the expression that came from one of the original values.
It also isn't dependent on what symbols are used, so long as the strings used as keys aren't used for anything else.
import re
# this function both returns and mutates
# so that each list only has to be flattened once
def flatten(lst):
new_lst = []
for i, item in enumerate(lst):
if isinstance(item, list):
new_lst.extend(flatten(item))
else:
new_lst.append(item)
lst[:] = new_lst
return lst
def flatten_symbols(d):
# split the values using the keys as delimiters
delims = re.compile('({})'.format('|'.join(d)))
d = dict((key, delims.split(value)) for key, value in d.iteritems())
# turn the value lists into recursive lists
# replacing each occurence of a key with the corresponding value
for key, value in d.iteritems():
for i, item in enumerate(value):
if item in d:
d[key][i] = d[item]
# flatten the recursive lists
return dict((key, ''.join(flatten(value))) for key, value in d.iteritems())
d={'s1':{'a':'b+c','b':'f+g', 'f': 'd+e', 'e': 'h+i'},'s2':{'a':'b+c','b':'f+g'}}
new_d = dict((key, flatten_symbols(subdict)) for key, subdict in d.iteritems())
print new_d
You need to put that code in a function. The line of your comment should then call that function on whatever you want to replace, put it into the string, and assign the result into the dict.
This is an iterative procedure with a fuse that blows with too many iterations to check mutual infinite substitution.
Uses regex to split the string with multiple delimiters
Normalizes the string to
remove whitespaces.
Escapes the tokens, so you do not need to escape
the delimiters
Try the following implementation.
>>> def replace(d,delims,limit=5):
#Remove any whitespace characters
d=dict((k,v.translate(None,string.whitespace)) for k,v in d.iteritems())
#Escape the regex tokens
delims=re.escape(delims)
for i in range(limit): #Loop Limit, to prevent infinite Loop
changed=False
for k,v in d.iteritems():
#Its best to use regex if multiple tokens is involved
r="+".join(d.get(e,e) for e in re.split(delims,v))
if r!=v:
#Break if no change in any iteration
changed=True
d[k]=r
if not changed:
break
return d
>>> replace(d,"+")
{'a': 'y+u+g+c', 'b': 'y+u+g', 'f': 'y+u'}