How does the key argument to sorted work? - python

Code 1:
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
Code 2:
>>> student_tuples = [
... ('john', 'A', 15),
... ('jane', 'B', 12),
... ('dave', 'B', 10),
... ]
>>> from operator import itemgetter, attrgetter
>>>
>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
Why in code 1, is () omitted in key=str.lower, and it reports error if parentheses are included, but in code 2 in key=itemgetter(2), the parentheses are kept?

The key argument to sorted expects a function, which sorted then applies to each item of the thing to be sorted. The results of key(item) are compared to each other, instead of each original item, during the sorting process.
You can imagine it working a bit like this:
def sorted(thing_to_sort, key):
#
# ... lots of complicated stuff ...
#
if key(x) < key(y):
# do something
else:
# do something else
#
# ... lots more complicated stuff ...
#
return result
As you can see, the parentheses () are added to the function key inside sorted, applying it to x and y, which are items of thing_to_sort.
In your first example, str.lower is the function that gets applied to each x and y.
itemgetter is a bit different. It's a function which returns another function, and in your example, it's that other function which gets applied to x and y.
You can see how itemgetter works in the console:
>>> from operator import itemgetter
>>> item = ('john', 'A', 15)
>>> func = itemgetter(2)
>>> func(item)
15
It can be a little hard to get your head around "higher order" functions (ones which accept or return other functions) at first, but they're very useful for lots of different tasks, so it's worth experimenting with them until you feel comfortable.

poking around with the console a bit
str.lower reefers to the method 'lower' of 'str' objects
and str.lower() is a function, how ever str.lower() requires an argument, so properly written it would be str.lower("OH BOY") and it would return oh boy the error is because you did not pass any arguments to the function but it was expecting one.

Related

How do I map multiple functions over a list?

It's come to this:
How do I map multiple functions over a list?
items = ['this', 'that', 100]
item_types = (type(i) for i in items)
items_gen = map(next(item_types), items)
This does not work, neither do many other things I've tried.
What am I missing?
I get either the first type mapped all over the entire list, or the type applied to the first item and itself cut into character snippets...
If this is a dupe - sorry, can't find this question in any reasonable way being asked here due to a million ways being asked.
I am going to switch out the items for input()'s so this is just a crude example, but I want to apply types to input values.
Expected output:
I want to call next on this item_gen object and get: 'this', 'that', 100
Not: 'this', 'that', '100'
[see EDIT below]
I think you need one more generator in the chain, if this is what you're looking for, a type conversion (or verification?) system?
def mapper(typedefs, target):
igen = (i for i in typedefs)
item_types = (type(i) for i in igen)
return map(next(item_types), target)
so then if you say:
list(mapper(['a','b','c'],[1,2,3]))
You'd get:
['1','2','3']
This will throw ValueError exceptions in the reverse conversion, however.
[EDIT]: that's incorrect above. This looks good:
def foo(types, target):
if len(types) == len(target):
gen1 = (type(i) for i in types)
gen2 = ([i] for i in target) #key is make this a list
gen3 = (next(map(next(gen1),next(gen2))) for _ in types)
yield from gen3
Now we get item-by-item type conversion (attempts):
bar = foo(['a',True,3.14,1],[1,1,1,1]))
list(bar)
['1',True,1.0,1]
You can just zip 2 iterable and apply each function to each element. Example:
>>> def f1(x): return x+1
...
>>> def f2(x): return x+2
...
>>> def f3(x): return x+3
...
>>> functions = [f1,f2,f3]
>>> elements = [1,1,1]
>>> [f(el) for f,el in zip(functions,elements)]
[2, 3, 4]
Which in your case becomes:
>>> [f(el) for f,el in zip(item_types,items)]
['this', 'that', 100]
To use map here, you need to map function application. then, you can use the multi-argument form of map, something like:
>>> funcs = [lambda x: x+1, lambda x: x*2, lambda x: x**3]
>>> data = [1, 2, 3]
>>> def apply(f, x): return f(x)
...
>>> list(map(apply, funcs, data))
[2, 4, 27]
Note, passing next(item_types) makes no sense, that gives you the next item in item_types, which in this case, is the first type, if you want to understand what you were seeing before.
Or, with your example:
>>> items = ['this', 'that', 100]
>>> list(map(apply, map(type, items), items))
['this', 'that', 100]
What you seem to want to do is create a generator out of a list (I am not sure what the whole type thing there is for).
To do just that you can just call iter:
item_gen = iter(items)
res = []
res.append(next(item_gen))
res.append(next(item_gen))
res.append(next(item_gen))
print(res)
will print (and the types will be the original ones):
['this', 'that', 100]
Assuming you have just one set of types in a types list and you want them to get applied you can do the following thing:
types = [str, str, int]
item_gen = (tp(i) for tp, i in zip(types, items))

Replacing a single element in a tuple nested within a list - Is their a better way?

Edit - I want to change the value of a tuple nested in a list, at a specific position
eg changed nestedTuple[1][1] change to 'xXXXXx'
I have come up with this code, that works, but it just seems very 'Un-pure!'
Convert to a list - change - convert to tuple - insert back into list
I ASSuME that it would be very demanding on resources.
Could anyone please advise me if their is a better way?
>>> nestedTuple= [('a','b','c'), ('d','e','f'), ('g','h','i')]
>>> tempList = list(nestedTuple[1])
>>> tempList[1] = 'xXXXXx'
>>> nestedTuple[1] = tuple(tempList)
>>> print nestedTuple
[('a', 'b', 'c'), ('d', 'xXXXXx', 'f'), ('g', 'h', 'i')]
You can use slicing.
>>> i = 1
>>> nestedTuple = [('a','b','c'), ('d','e','f'), ('g','h','i')]
>>> nestedTuple[1] = nestedTuple[1][:i] + ('xXXXXx', ) + nestedTuple[1][i+1:]
>>> nestedTuple
[('a', 'b', 'c'), ('d', 'xXXXXx', 'f'), ('g', 'h', 'i')]
How about this?
nested_tuple[1] = tuple('XXXXX' if i==1 else x for i, x in enumerate(nested_tuple[1]))
Note that tuples aren't meant to be changed, so one liners aren't going to be very clean.
Depending on how many changes you want to make in your nestedTuple and depending on downstream in your program. You may want to built a nestedList from your nestedTuple
nestedList = [list(myTuple) for myTuple in nestedTuple]
and then do:
nestedList[x][y] = 'truc'
and then make a new nestedTuple if needed
Otherwise you should profile this
I understand that this is data structure that you are getting. Performance aside it would make for much cleaner and readable code to change the data to nested list, do the manipulation, and if you need to write it back to convert it back to nested tuple. May be suboptimal in terms of speed, but that might not be the limiting factor for your application.
nestedTuple= [('a','b','c'), ('d','e','f'), ('g','h','i')]
nestedList = [list(x) for x in nestedTuple]
now you can use normal list slicing and assigning
nestedList[1][1] = ['xxxxXXxxx']
if you need the data back in original nested tuple format use the one liner:
nestedTuple = [tuple(x) for x in nestedList]
most readable and least likely to contain bugs if your data structure grows and your slicing becomes more complex.
Very purpose to use tuples is its immutable means once the tuple is created the values cannot be changed. In your case the best way will be to use nested lists i.e., as shown below
>>> nestedList = [['a','b','c'], ['d','e','f'], ['g','h','i']]
Now to change the element 'e' in the list to 'xxxx' you can use as shown below
>>> nestedList[1][1] = 'xxxx'

How to print elements in a list in new lines?

I have a list
L = Counter(mywords)
Where
mywords = ['Well', 'Jim', 'opportunity', 'I', 'Governor', 'University', 'Denver', 'hospitality', 'There', 'lot', 'points', 'I', 'make', 'tonight', 'important', '20', 'years', 'ago', 'I', 'luckiest', 'man', 'earth', 'Michelle', 'agreed', 'marry', '(Laughter)', 'And', 'I', 'Sweetie', 'happy']
It's much longer than that but that's a snippet.
Now what I do next is:
print ("\n".join(c.most_common(10)))
Because I want it to show the 10 most commonly used words in that list AND their counts, but I want it to print out into new lines for each item in the list, instead I get this error:
TypeError: sequence item 0: expected str instance, tuple found
Any help would be appreciated, using Python 3.
print ("\n".join(map(str, c.most_common(10))))
If you want more control over the format, you can use a format string like this
print ("\n".join("{}: {}".format(k,v) for k,v in c.most_common(10)))
The simplest is:
for item, freq in L.most_common(10):
print(item, 'has a count of', freq) # or
print('there are {} occurrences of "{}"'.format(freq, item))
If you just want the strings:
print("\n".join(element for element, count in c.most_common(10)))
If you want the strings and the counts printed in the form ('foo', 11):
print ("\n".join(str(element_and_count)
for element_and_count in c.most_common(10)))
If you want the strings and counts in some other format of your choice:
print ("\n".join("{}: {}".format(element, count)
for element, count in c.most_common(10)))
Why? The most_common function returns (element, count) pairs. Those things are tuples, not strings. You can't just join tuples together. You can, of course, convert it to a string (option #2 above), but that only works if you actually want the format ('foo', 11) for each line. To get the other two options, you want to ignore half the tuple and use the other, or write your own format expression.
In any case, you want to do something to each member of the sequence returned by most_common. The Pythonic way to do that is with a list comprehension or generator expression.
Meanwhile, you should learn how to debug these kinds of cases. When join gives you a TypeError, break it up into pieces until you find the one that stores working (and try it with 2 instead of 10, just so there's less to read):
>>> print("\n".join(c.most_common(2)))
TypeError: sequence item 0: expected str instance, tuple found
>>> c.most_common(2)
[('I', 4), ('man', 1)]
Aha! Each thing in the list is a tuple of two things, not just a string. Why?
>>> help(c.most_common)
most_common(self, n=None) method of collections.Counter instance
List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
OK, so it returns the most common elements and their counts. I just want the elements. So:
>>> [element for element, count in c.most_common(2)]
['I', 'man']
Now that's something I can join:
>>> '\n'.join([element for element, count in c.most_common(2)])
'I\nman'
And I don't need both brackets and parents (I can just use an expression instead of a list comprehension):
>>> '\n'.join(element for element, count in c.most_common(2))
'I\nman'
And now, I can print it:
>>> print('\n'.join(element for element, count in c.most_common(2)))
I
man
And now that it's working, print all 10:
>>> print('\n'.join(element for element, count in c.most_common(10)))
I'm surprised that nobody suggested using the unpacking operator *, since you say python3 so why not do the following, you can test it here too.
print(*[x[0]for x in L.most_common(10)], sep="\n")
Related questions
Python 3.3: separation argument (sep) giving an error
What does ** (double star) and * (star) do for parameters?
List comprehensions

Implement lookahead iterator for strings in Python

I'm doing some parsing that requires one token of lookahead. What I'd like is a fast function (or class?) that would take an iterator and turn it into a list of tuples in the form (token, lookahead), such that:
>>> a = ['a', 'b', 'c', 'd']
>>> list(lookahead(a))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', None)]
basically, this would be handy for looking ahead in iterators like this:
for (token, lookahead_1) in lookahead(a):
pass
Though, I'm not sure if there's a name for this technique or function in itertools that already will do this. Any ideas?
Thanks!
There are easier ways if you are just using lists - see Sven's answer. Here is one way to do it for general iterators
>>> from itertools import tee, izip_longest
>>> a = ['a', 'b', 'c', 'd']
>>> it1, it2 = tee(iter(a))
>>> next(it2) # discard this first value
'a'
>>> [(x,y) for x,y in izip_longest(it1, it2)]
# or just list(izip_longest(it1, it2))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', None)]
Here's how to use it in a for loop like in your question.
>>> it1,it2 = tee(iter(a))
>>> next(it2)
'a'
>>> for (token, lookahead_1) in izip_longest(it1,it2):
... print token, lookahead_1
...
a b
b c
c d
d None
Finally, here's the function you are looking for
>>> def lookahead(it):
... it1, it2 = tee(iter(it))
... next(it2)
... return izip_longest(it1, it2)
...
>>> for (token, lookahead_1) in lookahead(a):
... print token, lookahead_1
...
a b
b c
c d
d None
I like both Sven's and gnibbler's answers, but for some reason, it pleases me to roll my own generator.
def lookahead(iterable, null_item=None):
iterator = iter(iterable) # in case a list is passed
prev = iterator.next()
for item in iterator:
yield prev, item
prev = item
yield prev, null_item
Tested:
>>> for i in lookahead(x for x in []):
... print i
...
>>> for i in lookahead(x for x in [0]):
... print i
...
(0, None)
>>> for i in lookahead(x for x in [0, 1, 2]):
... print i
...
(0, 1)
(1, 2)
(2, None)
Edit: Karl and ninjagecko raise an excellent point -- the sequence passed in may contain None, and so using None as the final lookahead value may lead to ambiguity. But there's no obvious alternative; a module-level constant is possibly the best approach in many cases, but may be overkill for a one-off function like this -- not to mention the fact that bool(object()) == True, which could lead to unexpected behavior. Instead, I've added a null_item parameter with a default of None -- that way users can pass in whatever makes sense for their needs, be it a simple object() sentinel, a constant of their own creation, or even a class instance with special behavior. Since most of the time None is the obvious and even possibly the expected behavior, I've left None as the default.
The usual way to do this for a list a is
from itertools import izip_longest
for token, lookahead in izip_longest(a, a[1:]):
pass
For the last token, you will get None as look-ahead token.
If you want to avoid the copy of the list introduced by a[1:], you can use islice(a, 1, None) instead. For a slight modification working for arbitrary iterables, see the answer by gnibbler. For a simple, easy to grasp generator function also working for arbitrary iterables, see the answer by senderle.
You might find the answer to your question here: Using lookahead with generators.
I consider all these answers incorrect, because they will cause unforeseen bugs if your list contains None. Here is my take:
SEQUENCE_END = object()
def lookahead(iterable):
iter = iter(iterable)
current = next(iter)
for ahead in iter:
yield current,ahead
current = ahead
yield current,SEQUENCE_END
Example:
>>> for x,ahead in lookahead(range(3)):
>>> print(x,ahead)
0, 1
1, 2
2, <object SEQUENCE_END>
Example of how this answer is better:
def containsDoubleElements(seq):
"""
Returns whether seq contains double elements, e.g. [1,2,2,3]
"""
return any(val==nextVal for val,nextVal in lookahead(seq))
>>> containsDoubleElements([None])
False # correct!
def containsDoubleElements_BAD(seq):
"""
Returns whether seq contains double elements, e.g. [1,2,2,3]
"""
return any(val==nextVal for val,nextVal in lookahead_OTHERANSWERS(seq))
>>> containsDoubleElements([None])
True # incorrect!

Is there a 'multimap' implementation in Python?

I am new to Python, and I am familiar with implementations of Multimaps in other languages. Does Python have such a data structure built-in, or available in a commonly-used library?
To illustrate what I mean by "multimap":
a = multidict()
a[1] = 'a'
a[1] = 'b'
a[2] = 'c'
print(a[1]) # prints: ['a', 'b']
print(a[2]) # prints: ['c']
Such a thing is not present in the standard library. You can use a defaultdict though:
>>> from collections import defaultdict
>>> md = defaultdict(list)
>>> md[1].append('a')
>>> md[1].append('b')
>>> md[2].append('c')
>>> md[1]
['a', 'b']
>>> md[2]
['c']
(Instead of list you may want to use set, in which case you'd call .add instead of .append.)
As an aside: look at these two lines you wrote:
a[1] = 'a'
a[1] = 'b'
This seems to indicate that you want the expression a[1] to be equal to two distinct values. This is not possible with dictionaries because their keys are unique and each of them is associated with a single value. What you can do, however, is extract all values inside the list associated with a given key, one by one. You can use iter followed by successive calls to next for that. Or you can just use two loops:
>>> for k, v in md.items():
... for w in v:
... print("md[%d] = '%s'" % (k, w))
...
md[1] = 'a'
md[1] = 'b'
md[2] = 'c'
Just for future visitors. Currently there is a python implementation of Multimap. It's available via pypi
Stephan202 has the right answer, use defaultdict. But if you want something with the interface of C++ STL multimap and much worse performance, you can do this:
multimap = []
multimap.append( (3,'a') )
multimap.append( (2,'x') )
multimap.append( (3,'b') )
multimap.sort()
Now when you iterate through multimap, you'll get pairs like you would in a std::multimap. Unfortunately, that means your loop code will start to look as ugly as C++.
def multimap_iter(multimap,minkey,maxkey=None):
maxkey = minkey if (maxkey is None) else maxkey
for k,v in multimap:
if k<minkey: continue
if k>maxkey: break
yield k,v
# this will print 'a','b'
for k,v in multimap_iter(multimap,3,3):
print v
In summary, defaultdict is really cool and leverages the power of python and you should use it.
You can take list of tuples and than can sort them as if it was a multimap.
listAsMultimap=[]
Let's append some elements (tuples):
listAsMultimap.append((1,'a'))
listAsMultimap.append((2,'c'))
listAsMultimap.append((3,'d'))
listAsMultimap.append((2,'b'))
listAsMultimap.append((5,'e'))
listAsMultimap.append((4,'d'))
Now sort it.
listAsMultimap=sorted(listAsMultimap)
After printing it you will get:
[(1, 'a'), (2, 'b'), (2, 'c'), (3, 'd'), (4, 'd'), (5, 'e')]
That means it is working as a Multimap!
Please note that like multimap here values are also sorted in ascending order if the keys are the same (for key=2, 'b' comes before 'c' although we didn't append them in this order.)
If you want to get them in descending order just change the sorted() function like this:
listAsMultimap=sorted(listAsMultimap,reverse=True)
And after you will get output like this:
[(5, 'e'), (4, 'd'), (3, 'd'), (2, 'c'), (2, 'b'), (1, 'a')]
Similarly here values are in descending order if the keys are the same.
The standard way to write this in Python is with a dict whose elements are each a list or set. As stephan202 says, you can somewhat automate this with a defaultdict, but you don't have to.
In other words I would translate your code to
a = dict()
a[1] = ['a', 'b']
a[2] = ['c']
print(a[1]) # prints: ['a', 'b']
print(a[2]) # prints: ['c']
Or subclass dict:
class Multimap(dict):
def __setitem__(self, key, value):
if key not in self:
dict.__setitem__(self, key, [value]) # call super method to avoid recursion
else
self[key].append(value)
There is no multi-map in the Python standard libs currently.
WebOb has a MultiDict class used to represent HTML form values, and it is used by a few Python Web frameworks, so the implementation is battle tested.
Werkzeug also has a MultiDict class, and for the same reason.

Categories

Resources