Implement lookahead iterator for strings in Python

Implement lookahead iterator for strings in Python - python

I'm doing some parsing that requires one token of lookahead. What I'd like is a fast function (or class?) that would take an iterator and turn it into a list of tuples in the form (token, lookahead), such that:
>>> a = ['a', 'b', 'c', 'd']
>>> list(lookahead(a))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', None)]
basically, this would be handy for looking ahead in iterators like this:
for (token, lookahead_1) in lookahead(a):
pass
Though, I'm not sure if there's a name for this technique or function in itertools that already will do this. Any ideas?
Thanks!

There are easier ways if you are just using lists - see Sven's answer. Here is one way to do it for general iterators
>>> from itertools import tee, izip_longest
>>> a = ['a', 'b', 'c', 'd']
>>> it1, it2 = tee(iter(a))
>>> next(it2) # discard this first value
'a'
>>> [(x,y) for x,y in izip_longest(it1, it2)]
# or just list(izip_longest(it1, it2))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', None)]
Here's how to use it in a for loop like in your question.
>>> it1,it2 = tee(iter(a))
>>> next(it2)
'a'
>>> for (token, lookahead_1) in izip_longest(it1,it2):
... print token, lookahead_1
...
a b
b c
c d
d None
Finally, here's the function you are looking for
>>> def lookahead(it):
... it1, it2 = tee(iter(it))
... next(it2)
... return izip_longest(it1, it2)
...
>>> for (token, lookahead_1) in lookahead(a):
... print token, lookahead_1
...
a b
b c
c d
d None

I like both Sven's and gnibbler's answers, but for some reason, it pleases me to roll my own generator.
def lookahead(iterable, null_item=None):
iterator = iter(iterable) # in case a list is passed
prev = iterator.next()
for item in iterator:
yield prev, item
prev = item
yield prev, null_item
Tested:
>>> for i in lookahead(x for x in []):
... print i
...
>>> for i in lookahead(x for x in [0]):
... print i
...
(0, None)
>>> for i in lookahead(x for x in [0, 1, 2]):
... print i
...
(0, 1)
(1, 2)
(2, None)
Edit: Karl and ninjagecko raise an excellent point -- the sequence passed in may contain None, and so using None as the final lookahead value may lead to ambiguity. But there's no obvious alternative; a module-level constant is possibly the best approach in many cases, but may be overkill for a one-off function like this -- not to mention the fact that bool(object()) == True, which could lead to unexpected behavior. Instead, I've added a null_item parameter with a default of None -- that way users can pass in whatever makes sense for their needs, be it a simple object() sentinel, a constant of their own creation, or even a class instance with special behavior. Since most of the time None is the obvious and even possibly the expected behavior, I've left None as the default.

The usual way to do this for a list a is
from itertools import izip_longest
for token, lookahead in izip_longest(a, a[1:]):
pass
For the last token, you will get None as look-ahead token.
If you want to avoid the copy of the list introduced by a[1:], you can use islice(a, 1, None) instead. For a slight modification working for arbitrary iterables, see the answer by gnibbler. For a simple, easy to grasp generator function also working for arbitrary iterables, see the answer by senderle.

You might find the answer to your question here: Using lookahead with generators.

I consider all these answers incorrect, because they will cause unforeseen bugs if your list contains None. Here is my take:
SEQUENCE_END = object()
def lookahead(iterable):
iter = iter(iterable)
current = next(iter)
for ahead in iter:
yield current,ahead
current = ahead
yield current,SEQUENCE_END
Example:
>>> for x,ahead in lookahead(range(3)):
>>> print(x,ahead)
0, 1
1, 2
2, <object SEQUENCE_END>
Example of how this answer is better:
def containsDoubleElements(seq):
"""
Returns whether seq contains double elements, e.g. [1,2,2,3]
"""
return any(val==nextVal for val,nextVal in lookahead(seq))
>>> containsDoubleElements([None])
False # correct!
def containsDoubleElements_BAD(seq):
"""
Returns whether seq contains double elements, e.g. [1,2,2,3]
"""
return any(val==nextVal for val,nextVal in lookahead_OTHERANSWERS(seq))
>>> containsDoubleElements([None])
True # incorrect!

Related

Indexing a list with nested lists [duplicate]

The list.index(x) function returns the index in the list of the first item whose value is x.
Is there a function, list_func_index(), similar to the index() function that has a function, f(), as a parameter. The function, f() is run on every element, e, of the list until f(e) returns True. Then list_func_index() returns the index of e.
Codewise:
>>> def list_func_index(lst, func):
for i in range(len(lst)):
if func(lst[i]):
return i
raise ValueError('no element making func True')
>>> l = [8,10,4,5,7]
>>> def is_odd(x): return x % 2 != 0
>>> list_func_index(l,is_odd)
3
Is there a more elegant solution? (and a better name for the function)

You could do that in a one-liner using generators:
next(i for i,v in enumerate(l) if is_odd(v))
The nice thing about generators is that they only compute up to the requested amount. So requesting the first two indices is (almost) just as easy:
y = (i for i,v in enumerate(l) if is_odd(v))
x1 = next(y)
x2 = next(y)
Though, expect a StopIteration exception after the last index (that is how generators work). This is also convenient in your "take-first" approach, to know that no such value was found --- the list.index() function would throw ValueError here.

One possibility is the built-in enumerate function:
def index_of_first(lst, pred):
for i,v in enumerate(lst):
if pred(v):
return i
return None
It's typical to refer a function like the one you describe as a "predicate"; it returns true or false for some question. That's why I call it pred in my example.
I also think it would be better form to return None, since that's the real answer to the question. The caller can choose to explode on None, if required.

#Paul's accepted answer is best, but here's a little lateral-thinking variant, mostly for amusement and instruction purposes...:
>>> class X(object):
... def __init__(self, pred): self.pred = pred
... def __eq__(self, other): return self.pred(other)
...
>>> l = [8,10,4,5,7]
>>> def is_odd(x): return x % 2 != 0
...
>>> l.index(X(is_odd))
3
essentially, X's purpose is to change the meaning of "equality" from the normal one to "satisfies this predicate", thereby allowing the use of predicates in all kinds of situations that are defined as checking for equality -- for example, it would also let you code, instead of if any(is_odd(x) for x in l):, the shorter if X(is_odd) in l:, and so forth.
Worth using? Not when a more explicit approach like that taken by #Paul is just as handy (especially when changed to use the new, shiny built-in next function rather than the older, less appropriate .next method, as I suggest in a comment to that answer), but there are other situations where it (or other variants of the idea "tweak the meaning of equality", and maybe other comparators and/or hashing) may be appropriate. Mostly, worth knowing about the idea, to avoid having to invent it from scratch one day;-).

Not one single function, but you can do it pretty easily:
>>> test = lambda c: c == 'x'
>>> data = ['a', 'b', 'c', 'x', 'y', 'z', 'x']
>>> map(test, data).index(True)
3
>>>
If you don't want to evaluate the entire list at once you can use itertools, but it's not as pretty:
>>> from itertools import imap, ifilter
>>> from operator import itemgetter
>>> test = lambda c: c == 'x'
>>> data = ['a', 'b', 'c', 'x', 'y', 'z']
>>> ifilter(itemgetter(1), enumerate(imap(test, data))).next()[0]
3
>>>
Just using a generator expression is probably more readable than itertools though.
Note in Python3, map and filter return lazy iterators and you can just use:
from operator import itemgetter
test = lambda c: c == 'x'
data = ['a', 'b', 'c', 'x', 'y', 'z']
next(filter(itemgetter(1), enumerate(map(test, data))))[0] # 3

A variation on Alex's answer. This avoids having to type X every time you want to use is_odd or whichever predicate
>>> class X(object):
... def __init__(self, pred): self.pred = pred
... def __eq__(self, other): return self.pred(other)
...
>>> L = [8,10,4,5,7]
>>> is_odd = X(lambda x: x%2 != 0)
>>> L.index(is_odd)
3
>>> less_than_six = X(lambda x: x<6)
>>> L.index(less_than_six)
2

you could do this with a list-comprehension:
l = [8,10,4,5,7]
filterl = [a for a in l if a % 2 != 0]
Then filterl will return all members of the list fulfilling the expression a % 2 != 0. I would say a more elegant method...

Intuitive one-liner solution:
i = list(map(lambda value: value > 0, data)).index(True)
Explanation:
we use map function to create a list containing True or False based on if each element in our list pass the condition in the lambda or not.
then we convert the map output to list
then using the index function, we get the index of the first true which is the same index of the first value passing the condition.

How does the key argument to sorted work?

Code 1:
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
Code 2:
>>> student_tuples = [
... ('john', 'A', 15),
... ('jane', 'B', 12),
... ('dave', 'B', 10),
... ]
>>> from operator import itemgetter, attrgetter
>>>
>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
Why in code 1, is () omitted in key=str.lower, and it reports error if parentheses are included, but in code 2 in key=itemgetter(2), the parentheses are kept?

The key argument to sorted expects a function, which sorted then applies to each item of the thing to be sorted. The results of key(item) are compared to each other, instead of each original item, during the sorting process.
You can imagine it working a bit like this:
def sorted(thing_to_sort, key):
#
# ... lots of complicated stuff ...
#
if key(x) < key(y):
# do something
else:
# do something else
#
# ... lots more complicated stuff ...
#
return result
As you can see, the parentheses () are added to the function key inside sorted, applying it to x and y, which are items of thing_to_sort.
In your first example, str.lower is the function that gets applied to each x and y.
itemgetter is a bit different. It's a function which returns another function, and in your example, it's that other function which gets applied to x and y.
You can see how itemgetter works in the console:
>>> from operator import itemgetter
>>> item = ('john', 'A', 15)
>>> func = itemgetter(2)
>>> func(item)
15
It can be a little hard to get your head around "higher order" functions (ones which accept or return other functions) at first, but they're very useful for lots of different tasks, so it's worth experimenting with them until you feel comfortable.

poking around with the console a bit
str.lower reefers to the method 'lower' of 'str' objects
and str.lower() is a function, how ever str.lower() requires an argument, so properly written it would be str.lower("OH BOY") and it would return oh boy the error is because you did not pass any arguments to the function but it was expecting one.

Understanding builtin next() function

I read the documentation on next() and I understand it abstractly. From what I understand, next() is used as a reference to an iterable object and makes python cycle to the next iterable object sequentially. Makes sense! My question is, how is this useful outside the context of the builtin for loop? When would someone ever need to use next() directly? Can someone provide a simplistic example? Thanks mates!

As luck would have it, I wrote one yesterday:
def skip_letters(f, skip=" "):
"""Wrapper function to skip specified characters when encrypting."""
def func(plain, *args, **kwargs):
gen = f(p for p in plain if p not in skip, *args, **kwargs)
for p in plain:
if p in skip:
yield p
else:
yield next(gen)
return func
This uses next to get the return values from the generator function f, but interspersed with other values. This allows some values to be passed through the generator, but others to be yielded straight out.

There are many places where we can use next, for eg.
Drop the header while reading a file.
with open(filename) as f:
next(f) #drop the first line
#now do something with rest of the lines
Iterator based implementation of zip(seq, seq[1:])(from pairwise recipe iterools):
from itertools import tee, izip
it1, it2 = tee(seq)
next(it2)
izip(it1, it2)
Get the first item that satisfies a condition:
next(x for x in seq if x % 100)
Creating a dictionary using adjacent items as key-value:
>>> it = iter(['a', 1, 'b', 2, 'c', '3'])
>>> {k: next(it) for k in it}
{'a': 1, 'c': '3', 'b': 2}

next is useful in many different ways, even outside of a for-loop. For example, if you have an iterable of objects and you want the first that meets a condition, you can give it a generator expression like so:
>>> lst = [1, 2, 'a', 'b']
>>> # Get the first item in lst that is a string
>>> next(x for x in lst if isinstance(x, str))
'a'
>>> # Get the fist item in lst that != 1
>>> lst = [1, 1, 1, 2, 1, 1, 3]
>>> next(x for x in lst if x != 1)
2
>>>

Generate combinations of elements from multiple lists

I'm making a function that takes a variable number of lists as input (i.e., an arbitrary argument list).
I need to compare each element from each list to each element of all other lists, but I couldn't find any way to approach this.

Depending on your goal, you can make use of some of the itertools utilities. For example, you can use itertools.product on *args:
from itertools import product
for comb in product(*args):
if len(set(comb)) < len(comb):
# there are equal values....
But currently it's not very clear from your question what you want to achieve. If I didn't understand you correctly, you can try to state the question in a more specific way.

I think #LevLeitsky's answer is the best way to do a loop over the items from your variable number of lists. However, if purpose the loop is just to find common elements between pairs of items from the lists, I'd do it a bit differently.
Here's an approach that finds the common elements between each pair of lists:
import itertools
def func(*args):
sets = [set(l) for l in args]
for a, b in itertools.combinations(sets, 2):
common = a & b # set intersection
# do stuff with the set of common elements...
I'm not sure what you need to do with the common elements, so I'll leave it there.

The itertools module provides a lot of useful tools just for such tasks. You can adapt the following example to your task by integrating it into your specific comparison logic.
Note that the following assumes a commutative function. That is, about half of the tuples are omitted for reasons of symmetry.
Example:
import itertools
def generate_pairs(*args):
# assuming function is commutative
for i, l in enumerate(args, 1):
for x, y in itertools.product(l, itertools.chain(*args[i:])):
yield (x, y)
# you can use lists instead of strings as well
for x, y in generate_pairs("ab", "cd", "ef"):
print (x, y)
# e.g., apply your comparison logic
print any(x == y for x, y in generate_pairs("ab", "cd", "ef"))
print all(x != y for x, y in generate_pairs("ab", "cd", "ef"))
Output:
$ python test.py
('a', 'c')
('a', 'd')
('a', 'e')
('a', 'f')
('b', 'c')
('b', 'd')
('b', 'e')
('b', 'f')
('c', 'e')
('c', 'f')
('d', 'e')
('d', 'f')
False
True

if you want the arguments as dictionary
def kw(**kwargs):
for key, value in kwargs.items():
print key, value
if you want all the arguments as list:
def arg(*args):
for item in args:
print item
you can use both
def using_both(*args, **kwargs) :
kw(kwargs)
arg(args)
call it like that:
using_both([1,2,3,4,5],a=32,b=55)

Is there a 'multimap' implementation in Python?

I am new to Python, and I am familiar with implementations of Multimaps in other languages. Does Python have such a data structure built-in, or available in a commonly-used library?
To illustrate what I mean by "multimap":
a = multidict()
a[1] = 'a'
a[1] = 'b'
a[2] = 'c'
print(a[1]) # prints: ['a', 'b']
print(a[2]) # prints: ['c']

Such a thing is not present in the standard library. You can use a defaultdict though:
>>> from collections import defaultdict
>>> md = defaultdict(list)
>>> md[1].append('a')
>>> md[1].append('b')
>>> md[2].append('c')
>>> md[1]
['a', 'b']
>>> md[2]
['c']
(Instead of list you may want to use set, in which case you'd call .add instead of .append.)
As an aside: look at these two lines you wrote:
a[1] = 'a'
a[1] = 'b'
This seems to indicate that you want the expression a[1] to be equal to two distinct values. This is not possible with dictionaries because their keys are unique and each of them is associated with a single value. What you can do, however, is extract all values inside the list associated with a given key, one by one. You can use iter followed by successive calls to next for that. Or you can just use two loops:
>>> for k, v in md.items():
... for w in v:
... print("md[%d] = '%s'" % (k, w))
...
md[1] = 'a'
md[1] = 'b'
md[2] = 'c'

Just for future visitors. Currently there is a python implementation of Multimap. It's available via pypi

Stephan202 has the right answer, use defaultdict. But if you want something with the interface of C++ STL multimap and much worse performance, you can do this:
multimap = []
multimap.append( (3,'a') )
multimap.append( (2,'x') )
multimap.append( (3,'b') )
multimap.sort()
Now when you iterate through multimap, you'll get pairs like you would in a std::multimap. Unfortunately, that means your loop code will start to look as ugly as C++.
def multimap_iter(multimap,minkey,maxkey=None):
maxkey = minkey if (maxkey is None) else maxkey
for k,v in multimap:
if k<minkey: continue
if k>maxkey: break
yield k,v
# this will print 'a','b'
for k,v in multimap_iter(multimap,3,3):
print v
In summary, defaultdict is really cool and leverages the power of python and you should use it.

You can take list of tuples and than can sort them as if it was a multimap.
listAsMultimap=[]
Let's append some elements (tuples):
listAsMultimap.append((1,'a'))
listAsMultimap.append((2,'c'))
listAsMultimap.append((3,'d'))
listAsMultimap.append((2,'b'))
listAsMultimap.append((5,'e'))
listAsMultimap.append((4,'d'))
Now sort it.
listAsMultimap=sorted(listAsMultimap)
After printing it you will get:
[(1, 'a'), (2, 'b'), (2, 'c'), (3, 'd'), (4, 'd'), (5, 'e')]
That means it is working as a Multimap!
Please note that like multimap here values are also sorted in ascending order if the keys are the same (for key=2, 'b' comes before 'c' although we didn't append them in this order.)
If you want to get them in descending order just change the sorted() function like this:
listAsMultimap=sorted(listAsMultimap,reverse=True)
And after you will get output like this:
[(5, 'e'), (4, 'd'), (3, 'd'), (2, 'c'), (2, 'b'), (1, 'a')]
Similarly here values are in descending order if the keys are the same.

The standard way to write this in Python is with a dict whose elements are each a list or set. As stephan202 says, you can somewhat automate this with a defaultdict, but you don't have to.
In other words I would translate your code to
a = dict()
a[1] = ['a', 'b']
a[2] = ['c']
print(a[1]) # prints: ['a', 'b']
print(a[2]) # prints: ['c']

Or subclass dict:
class Multimap(dict):
def __setitem__(self, key, value):
if key not in self:
dict.__setitem__(self, key, [value]) # call super method to avoid recursion
else
self[key].append(value)

There is no multi-map in the Python standard libs currently.
WebOb has a MultiDict class used to represent HTML form values, and it is used by a few Python Web frameworks, so the implementation is battle tested.
Werkzeug also has a MultiDict class, and for the same reason.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Implement lookahead iterator for strings in Python - python

You might find the answer to your question here: Using lookahead with generators.

Related

Indexing a list with nested lists [duplicate]

How does the key argument to sorted work?

Understanding builtin next() function

Generate combinations of elements from multiple lists

Is there a 'multimap' implementation in Python?

Categories

Resources