Idiomatic Python has_one - python

I just invented a stupid little helper function:
def has_one(seq, predicate=bool):
"""Return whether there is exactly one item in `seq` that matches
`predicate`, with a minimum of evaluation (short-circuit).
"""
iterator = (item for item in seq if predicate(item))
try:
iterator.next()
except StopIteration: # No items match predicate.
return False
try:
iterator.next()
except StopIteration: # Exactly one item matches predicate.
return True
return False # More than one item matches the predicate.
Because the most readable/idiomatic inline thing I could come up with was:
[predicate(item) for item in seq].count(True) == 1
... which is fine in my case because I know seq is small, but it just feels weird. Is there an idiom I’m forgetting here that prevents me from having to break out this helper?
Clarification
Looking back on it, this was kind of a crappily posed question, though we got some excellent answers! I was looking for either:
An obvious and readable inline idiom or stdlib function, eager evaluation being acceptable in this case.
A more obvious and readable helper function -- since it's breaking out a whole other function, only the minimum amount of evaluation seems acceptable.
#Stephan202 came up with a really cool idiom for the helper function and #Martin v. Löwis came up with a more simple inline idiom under the assumption that the predicate returns a bool. Thanks # everybody for your help!

How about calling any twice, on an iterator (Python 2.x and 3.x compatible)?
>>> def has_one(seq, predicate=bool):
... seq = (predicate(e) for e in seq)
... return any(seq) and not any(seq)
...
>>> has_one([])
False
>>> has_one([1])
True
>>> has_one([0])
False
>>> has_one([1, 2])
False
any will take at most one element which evaluates to True from the iterator. If it succeeds the first time and fails the second time, then only one element matches the predicate.
Edit: I see Robert Rossney suggests a generalized version, which checks whether exactly n elements match the predicate. Let me join in on the fun, using all:
>>> def has_n(seq, n, predicate=bool):
... seq = (predicate(e) for e in seq)
... return all(any(seq) for _ in range(n)) and not any(seq)
...
>>> has_n(range(0), 3)
False
>>> has_n(range(3), 3)
False
>>> has_n(range(4), 3)
True
>>> has_n(range(5), 3)
False

Perhaps something like this is more to your taste?
def has_one(seq,predicate=bool):
nwanted=1
n=0
for item in seq:
if predicate(item):
n+=1
if n>nwanted:
return False
return n==nwanted
This is rather like the list comprehension example, but requires only one pass over one sequence. Compared to the second has_one function, and like the list comprehension code, it generalizes more easily to other counts. I've demonstrated this (hopefully without error...) by adding in a variable for the number of items wanted.

I liked Stephan202's answer, but I like this one a little more, even though it's two lines instead of one. I like it because it's just as crazy but a tiny bit more explicit about how its craziness works:
def has_one(seq):
g = (x for x in seq)
return any(g) and not any(g)
Edit:
Here's a more generalized version that supports a predicate:
def has_exactly(seq, count, predicate = bool):
g = (predicate(x) for x in seq)
while(count > 0):
if not any(g):
return False
count -= 1
if count == 0:
return not any(g)

Not sure whether it is any better than the versions you proposed, however...
If predicate is guaranteed to return True/False only, then
sum(map(predicate, seq)) == 1
will do (although it won't stop at the second element)

How about ...
import functools
import operator
def exactly_one(seq):
"""
Handy for ensuring that exactly one of a bunch of options has been set.
>>> exactly_one((3, None, 'frotz', None))
False
>>> exactly_one((None, None, 'frotz', None))
True
"""
return 1 == functools.reduce(operator.__add__, [1 for x in seq if x])

Look, Ma! No rtfm("itertools"), no dependency on predicate() returning a boolean, minimum evaluation, just works!
Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def count_in_bounds(seq, predicate=lambda x: x, low=1, high=1):
... count = 0
... for item in seq:
... if predicate(item):
... count = count + 1
... if count > high:
... return 0
... return count >= low
...
>>> seq1 = [0, 0, 1, 0, 1, 0, 1, 0, 0, 0]
>>> count_in_bounds(seq1)
0
>>> count_in_bounds(seq1, low=3, high=3)
1
>>> count_in_bounds(seq1, low=3, high=4)
1
>>> count_in_bounds(seq1, low=4, high=4)
0
>>> count_in_bounds(seq1, low=0, high=3)
1
>>> count_in_bounds(seq1, low=3, high=3)
1
>>>

Here's modified #Stephan202's answer:
from itertools import imap, repeat
def exactly_n_is_true(iterable, n, predicate=None):
it = iter(iterable) if predicate is None else imap(predicate, iterable)
return all(any(it) for _ in repeat(None, n)) and not any(it)
Differences:
predicate() is None by default. The meaning is the same as for built-in filter() and stdlib's itertools.ifilter() functions.
More explicit function and parameters names (this is subjective).
repeat() allows large n to be used.
Example:
if exactly_n_is_true(seq, 1, predicate):
# predicate() is true for exactly one item from the seq

This and this straightforward counting-loop solutions are definitely clearest.
For the sport of it, here is a variation on the any(g) and not any(g) theme that looks less magic on the surface - but it's actually similarly fragile when one comes to debug/modify it (you can't exchange the order, you have to understand how the short-circuiting and hands off a single iterator between two short-circuiting consumers...).
def cumulative_sums(values):
s = 0
for v in values:
s += v
yield s
def count_in_bounds(iterable, start=1, stop=2):
counter = cumulative_sums(bool(x) for x in iterable)
return (start in counter) and (stop not in counter)
It's trivial to also take a predicate instead of bool but I think it's better to follow any() and all() in leaving that to the caller - it's easy to pass a generator expression if needed.
Taking arbitrary [start, stop) is a nice bonus, but it's not as generic as I'd like. It's tempting to pass stop=None to emulate e.g. any(), which works, but always consumes all input; the proper emulation is kinda awkward:
def any(iterable):
return not count_in_bounds(iterable, 0, 1)
def all(iterable):
return count_in_bounds((not x for x in iterable), 0, 1)
Taking a variable number of bounds and specifying which should return True/False would get out of hand.
Perhaps a simple saturating counter is the best primitive:
def count_true(iterable, stop_at=float('inf')):
c = 0
for x in iterable:
c += bool(x)
if c >= stop_at:
break
return c
def any(iterable):
return count_true(iterable, 1) >= 1
def exactly_one(iterable):
return count_true(iterable, 2) == 1
def weird(iterable):
return count_true(iterable, 10) in {2, 3, 5, 7}
all() still requires negating the inputs, or a matching count_false() helper.

Related

How to check if an ordered non-consecutive subsequence is in array? Python

I'd be surprised if this hasn't been asked yet.
Let's say I have an array [5,6,7,29,34] and I want to check if the sequence 5,6,7 appears in it (which it does). Order does matter.
How would I do this?
Just for fun, here is a quick (very quick) and dirty (very dirty) solution (that is somewhat flawed, so don't really use this):
>>> str([5,6,7]).strip('[]') in str([5,6,7,29,34])
True
The RightWay™ is likely to use list.index() to find candidate matches for the first element and then verify the full match with slicing and list equality:
>>> def issubsequence(sub, seq):
i = -1
while True:
try:
i = seq.index(sub[0], i+1) # locate first character
except ValueError:
return False
if seq[i : i+len(sub)] == sub: # verify full match
return True
>>> issubsequence([5, 6, 7], [5,6,7,29,34])
True
>>> issubsequence([5, 20, 7], [5,6,7,29,34])
False
Edit: The OP clarified in a comment that the subsequence must be in order but need not be in consecutive positions. That has a different and much more complicated solution which was already answered here: How do you check if one array is a subsequence of another?
Here is a good solution:
def is_sublist(a, b):
if not a: return True
if not b: return False
return b[:len(a)] == a or is_sublist(a, b[1:])
As mentioned by Stefan Pochmann this can be rewritten as:
def is_sublist(a, b):
return b[:len(a)] == a or bool(b) and is_sublist(a, b[1:])
Here's a solution that works (efficiently!) on any pair of iterable objects:
import collections
import itertools
def consume(iterator, n=None):
"""Advance the iterator n-steps ahead. If n is none, consume entirely."""
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
def is_slice(seq, subseq):
"""Returns whether subseq is a contiguous subsequence of seq."""
subseq = tuple(subseq) # len(subseq) is needed so we make it a tuple.
seq_window = itertools.tee(seq, n=len(subseq))
for steps, it in enumerate(seq_window):
# advance each iterator to point to subsequent values in seq.
consume(it, n=steps)
return any(subseq == seq_slice for seq_slice in izip(*seq_window))
consume comes from itertools recipes.

Repetition inside of lists

I want to build a function that will return True if any two items in a list are the same.
For example, [1,7,3,7,4] should return True and ["one","ONE","One"] should return False.
I need help with which parts of python look for duplicates.
Loop over the values and use a set to track what you have already seen. As soon as you see a value again, return True:
def has_duplicates(lst):
seen = set()
for elem in lst:
if elem in seen:
return True
seen.add(elem)
return False
This is very efficient in that it short-circuits; it won't loop over the whole list if a duplicate has been detected early on.
Martijn's answer is the best, but with a few exceptions, this is worth a try.
>>> chk = lambda x: len(l) != len(set(l)) # check the length after removing dupes.
>>> l = [1,7,3,7,4]
>>> chk(l)
True
>>> l = ["one","ONE","One"]
>>> chk(l)
False
Note - As Martijn mentions in a comment, this is a slower process.
Using a collections.Counter dict:
from collections import Counter
def has_dupes(l):
# if most repeated key count is > 1 we have at least one dupe
return Counter(l).most_common(1)[0][1] > 1
Or use any:
def has_dupes(l):
return any(v > 1 for v in Counter(l).values())

Fast and pythonic way to find out if an anagram is a palindrome?

Given a string, how do we check if any anagram of it can be a palindrome?
For example let us consider the string "AAC". An anagram of it is "ACA" which is a palindrome. We have to write a method which takes a string and outputs true if we can form a palindrome from any anagram of the given string. Otherwise outputs false.
This is my current solution:
from collections import defaultdict
def check(s):
newdict = defaultdict(int)
for e in s:
newdict[e] += 1
times = 0
for e in newdict.values():
if times == 2:
return False
if e == 1:
times += 1
return True
Any shorter solutions using the python library?
Here is shorter solution that uses the standard library, with a corrected algorithm (all the character counts must be even, except for at most one):
from collections import Counter
def check(s):
return sum(1 for count in Counter(s).itervalues() if count % 2 == 1) <= 1
This is short but "slow", as the program goes through all the odd counts instead of stopping as soon as two are found. A faster solution that stops as soon as possible, is:
def check(s):
odd_counts = (count for count in Counter(s).itervalues() if count % 2 == 1)
try:
next(odd_counts) # Fails if there is no odd count
next(odd_counts) # Fails if there is one odd count
except StopIteration:
return True
else:
return False
This is probably better fit for code golfing, but eh it is quite trivial.
Observe that palindromes require a balanced set of sides, so you need generally even number of inputs per type. However a single odd item can be provided in the middle, so you can essentially raise that to a maximum of one set of characters that are odd. This can be done with a single list comprehension
>>> from collections import Counter
>>> def is_palindrome(letters):
... return len([v for v in Counter(letters).values() if v % 2]) <= 1
...
>>> is_palindrome('level')
True
>>> is_palindrome('levels')
False
>>> is_palindrome('levelss')
True
Oh wait, someone else beat with a solution, but that's what I got.
Without using Counter:
>>> def isit(s):
... ls = [ x % 2 for x in [s.count(x) for x in set(s)]]
... return [False, True][all(ls) or ls.count(1) == 1]
...
>>> isit('abc')
False
>>> isit('abb')
True
>>> isit('abbd')
False
>>> isit('abbdd')
True
>>> isit('abbdda')
True
>>>
Even though it's not algorithmically the best, (if your strings are not extremely long it's not a problem), I'd like to provide a more readable solution.
from itertools import permutations
def has_palindrome(s):
return any(c == c[::-1] for c in permutations(s,len(s)))

Is there a Python builtin for determining if an iterable contained a certain sequence?

For example, something like:
>>> [1, 2, 3].contains_sequence([1, 2])
True
>>> [1, 2, 3].contains_sequence([4])
False
I know that the in operator can do this for strings:
>>> "12" in "123"
True
But I'm looking for something that operates on iterables.
Referenced from https://stackoverflow.com/a/6822773/24718
modified to use a list.
from itertools import islice
def window(seq, n=2):
"""
Returns a sliding window (of width n) over data from the iterable
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
it = iter(seq)
result = list(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + [elem]
yield result
def contains_sequence(all_values, seq):
return any(seq == current_seq for current_seq in window(all_values, len(seq)))
test_iterable = [1,2,3]
search_sequence = [1,2]
result = contains_sequence(test_iterable, search_sequence)
Is there a Python builtin? No. You can accomplish this task in various ways. Here is a recipe that does it, and also gives you the position of the subsequence in the containing sequence:
def _search(forward, source, target, start=0, end=None):
"""Naive search for target in source."""
m = len(source)
n = len(target)
if end is None:
end = m
else:
end = min(end, m)
if n == 0 or (end-start) < n:
# target is empty, or longer than source, so obviously can't be found.
return None
if forward:
x = range(start, end-n+1)
else:
x = range(end-n, start-1, -1)
for i in x:
if source[i:i+n] == target:
return i
return None
As far as I know, there's no way to do this. You can roll your own function pretty easily, but I doubt that will be terribly efficient.
>>> def contains_seq(seq,subseq):
... #try: junk=seq[:]
... #except: seq=tuple(seq)
... #try: junk=subseq[:]
... #except: subseq=tuple(subseq)
... ll=len(subseq)
... for i in range(len(seq)-ll): #on python2, use xrange.
... if(seq[i:i+ll] == subseq):
... return True
... return False
...
>>> contains_seq(range(10),range(3)) #True
>>> contains_seq(range(10),[2,3,6]) #False
Note that this solution does not work with generator type objects (it only works on objects that you can slice). You could check seq to see if it is sliceable before proceeding and cast to a tuple if it isn't sliceable -- But then you get rid of the benefits of slicing. You could re-write it to check one element at a time instead of using slicing, but I have a feeling performance would suffer even more.
If preserving of order is not necessary, you can use sets (builtin):
>>> set([1,2]).issubset([1,2,3])
True
>>> set([4]).issubset([1,2,3])
False
Otherwise:
def is_subsequence(sub, iterable):
sub_pos, sub_len = 0, len(sub)
for i in iterable:
if i == sub[sub_pos]:
sub_pos += 1
if sub_pos >= sub_len:
return True
else:
sub_pos = 0
return False
>>> is_subsequence([1,2], [0,1,2,3,4])
True
>>> is_subsequence([2,1], [0,1,2,3,4]) # order preserved
False
>>> is_subsequence([1,2,4], [0,1,2,3,4])
False
This one works with any iterator.
As others have said, there's no builtin for this. Here's an implementation that is potentially more efficient than the other answers I've seen -- in particular, it scans through the iterable, just keeping track of what prefix sizes of the target sequence it's seen. But that increased efficiency comes at some expense in increased verbosity over some of the other approaches that have been suggested.
def contains_seq(iterable, seq):
"""
Returns true if the iterable contains the given sequence.
"""
# The following clause is optional -- leave it if you want to allow `seq` to
# be an arbitrary iterable; or remove it if `seq` will always be list-like.
if not isinstance(seq, collections.Sequence):
seq = tuple(seq)
if len(seq)==0: return True # corner case
partial_matches = []
for elt in iterable:
# Try extending each of the partial matches by adding the
# next element, if it matches.
partial_matches = [m+1 for m in partial_matches if elt == seq[m]]
# Check if we should start a new partial match
if elt==seq[0]:
partial_matches.append(1)
# Check if we have a complete match (partial_matches will always
# be sorted from highest to lowest, since older partial matches
# come before newer ones).
if partial_matches and partial_matches[0]==len(seq):
return True
# No match found.
return False
deque appears to be useful here:
from collections import deque
def contains(it, seq):
seq = deque(seq)
deq = deque(maxlen=len(seq))
for p in it:
deq.append(p)
if deq == seq:
return True
return False
Note that this accepts arbitrary iterables for both arguments (no slicing required).
As there's no builtin, I made a nice version:
import itertools as it
def contains(seq, sub):
seq = iter(seq)
o = object()
return any(all(i==j for i,j in zip(sub, it.chain((n,),seq,
(o for i in it.count())))) for n in seq)
This do not require any extra lists (if you use it.izip or Py3k).
>>> contains([1,2,3], [1,2])
True
>>> contains([1,2,3], [1,2,3])
True
>>> contains([1,2,3], [2,3])
True
>>> contains([1,2,3], [2,3,4])
False
Extra points if you have no trouble reading it. (It does the job, but the implementation is not to be taked too seriously). ;)
You could convert it into a string and then do matching on it
full_list = " ".join([str(x) for x in [1, 2, 3]])
seq = " ".join([str(x) for x in [1, 2]])
seq in full_list

How to get the nth element of a python list or a default if not available

In Python, how can I simply do the equivalent of dictionary.get(key, default) for lists - i.e., how can I simply get the nth element of a list, or a default value if not available?
For example, given a list myList, how can I get 5 if myList is empty, or myList[0] otherwise?
l[index] if index < len(l) else default
To support negative indices we can use:
l[index] if -len(l) <= index < len(l) else default
try:
a = b[n]
except IndexError:
a = default
Edit: I removed the check for TypeError - probably better to let the caller handle this.
(a[n:]+[default])[0]
This is probably better as a gets larger
(a[n:n+1]+[default])[0]
This works because if a[n:] is an empty list if n => len(a)
Here is an example of how this works with range(5)
>>> range(5)[3:4]
[3]
>>> range(5)[4:5]
[4]
>>> range(5)[5:6]
[]
>>> range(5)[6:7]
[]
And the full expression
>>> (range(5)[3:4]+[999])[0]
3
>>> (range(5)[4:5]+[999])[0]
4
>>> (range(5)[5:6]+[999])[0]
999
>>> (range(5)[6:7]+[999])[0]
999
Just discovered that :
next(iter(myList), 5)
iter(l) returns an iterator on myList, next() consumes the first element of the iterator, and raises a StopIteration error except if called with a default value, which is the case here, the second argument, 5
This only works when you want the 1st element, which is the case in your example, but not in the text of you question, so...
Additionally, it does not need to create temporary lists in memory and it works for any kind of iterable, even if it does not have a name (see Xiong Chiamiov's comment on gruszczy's answer)
(L[n:n+1] or [somedefault])[0]
... looking for an equivalent in python of dict.get(key, default) for lists
There is an itertools recipes that does this for general iterables. For convenience, you can > pip install more_itertools and import this third-party library that implements such recipes for you:
Code
import more_itertools as mit
mit.nth([1, 2, 3], 0)
# 1
mit.nth([], 0, 5)
# 5
Detail
Here is the implementation of the nth recipe:
def nth(iterable, n, default=None):
"Returns the nth item or a default value"
return next(itertools.islice(iterable, n, None), default)
Like dict.get(), this tool returns a default for missing indices. It applies to general iterables:
mit.nth((0, 1, 2), 1) # tuple
# 1
mit.nth(range(3), 1) # range generator (py3)
# 1
mit.nth(iter([0, 1, 2]), 1) # list iterator
# 1
Using Python 3.4's contextlib.suppress(exceptions) to build a getitem() method similar to getattr().
import contextlib
def getitem(iterable, index, default=None):
"""Return iterable[index] or default if IndexError is raised."""
with contextlib.suppress(IndexError):
return iterable[index]
return default
A cheap solution is to really make a dict with enumerate and use .get() as usual, like
dict(enumerate(l)).get(7, my_default)
After reading through the answers, I'm going to use:
(L[n:] or [somedefault])[0]
Combining #Joachim's with the above, you could use
next(iter(my_list[index:index+1]), default)
Examples:
next(iter(range(10)[8:9]), 11)
8
>>> next(iter(range(10)[12:13]), 11)
11
Or, maybe more clear, but without the len
my_list[index] if my_list[index:index + 1] else default
Althought this is not a one-liner solution, you can define a function with a default value like so:
def get_val(myList, idx, default=5):
try:
return myList[idx]
except IndexError:
return default
With unpacking:
b, = a[n:n+1] or [default]
For a small index, such as when parsing up to k arguments, I'd build a new list of length k, with added elements set to d, as follows:
def fill(l, k, d):
return l + (k - len(l)) * [d]
Typical usage:
N = 2
arg_one, arg_two = fill("first_word and the rest".split(maxsplit=N - 1), N, None)
# arg_one == "first_word"
# arg_two == "and the rest"
Same example, with a short list:
arg_one, arg_two = fill("one_word_only".split(maxsplit=N - 1), N, None)
# arg_one == "one_word_only"
# arg_two == None

Categories

Resources