How to count non-null elements in an iterable? - python

I'm looking for a better/more Pythonic solution for the following snippet
count = sum(1 for e in iterable if e)

len(list(filter(None, iterable)))
Using None as the predicate for filter just says to use the truthiness of the items. (maybe clearer would be len(list(filter(bool, iterable))))

Honestly, I can't think of a better way to do it than what you've got.
Well, I guess people could argue about "better," but I think you're unlikely to find anything shorter, simpler, and clearer.

Most Pythonic is to write a small auxiliary function and put it in your trusty "utilities" module (or submodule of appropriate package, when you have enough;-):
import itertools as it
def count(iterable):
"""Return number of items in iterable."""
return sum(1 for _ in iterable)
def count_conditional(iterable, predicate=None):
"""Return number of items in iterable that satisfy the predicate."""
return count(it.ifilter(predicate, iterable))
Exactly how you choose to implement these utilities is less important (you could choose at any time to recode some of them in Cython, for example, if some profiling on an application using the utilities shows it's useful): the key thing is having them as your own useful library of utility functions, with names and calling patterns you like, to make your all-important application level code clearer, more readable, and more concise that if you stuffed it full of inlined contortions!-)

sum(not not e for e in iterable)

As stated in the comments, the title is somewhat dissonant with the question. If we would insist on the title, i.e. counting non-null elements, the OP's solution could be modified to:
count = sum(1 for e in iterable if e is not None)

This isn't the fastest, but maybe handy for code-golf
sum(map(bool, iterable))

Propably the most Pythonic way is to write code that does not need count function.
Usually fastest is to write the style of functions that you are best with and continue to refine your style.
Write Once Read Often code.
By the way your code does not do what your title says! To count not 0 elements is not simple considering rounding errors in floating numbers, that False is 0..
If you have not floating point values in list, this could do it:
def nonzero(seq):
return (item for item in seq if item!=0)
seq = [None,'', 0, 'a', 3,[0], False]
print seq,'has',len(list(nonzero(seq))),'non-zeroes'
print 'Filter result',len(filter(None, seq))
"""Output:
[None, '', 0, 'a', 3, [0], False] has 5 non-zeroes
Filter result 3
"""

If you're just trying to see whether the iterable is not empty, then this would probably help:
def is_iterable_empty(it):
try:
iter(it).next()
except StopIteration:
return True
else:
return False
The other answers will take O(N) time to complete (and some take O(N) memory; good eye, John!). This function takes O(1) time. If you really need the length, then the other answers will help you more.

Here is a solution with O(n) runtime and O(1) additional memory:
count = reduce(lambda x,y:x+y, imap(lambda v: v>0, iterable))
Hope that helps!

Related

Find all the possible N-length anagrams - fast alternatives

I am given a sequence of letters and have to produce all the N-length anagrams of the sequence given, where N is the length of the sequence.
I am following a kinda naive approach in python, where I am taking all the permutations in order to achieve that. I have found some similar threads like this one but I would prefer a math-oriented approach in Python. So what would be a more performant alternative to permutations? Is there anything particularly wrong in my attempt below?
from itertools import permutations
def find_all_anagrams(word):
pp = permutations(word)
perm_set = set()
for i in pp:
perm_set.add(i)
ll = [list(i) for i in perm_set]
ll.sort()
print(ll)
If there are lots of repeated letters, the key will be to produce each anagram only once instead of producing all possible permutations and eliminating duplicates.
Here's one possible algorithm which only produces each anagram once:
from collections import Counter
def perm(unplaced, prefix):
if unplaced:
for element in unplaced:
yield from perm(unplaced - Counter(element), prefix + element)
else:
yield prefix
def permutations(iterable):
yield from perm(Counter(iterable), "")
That's actually not much different from the classic recursion to produce all permutations; the only difference is that it uses a collections.Counter (a multiset) to hold the as-yet-unplaced elements instead of just using a list.
The number of Counter objects produced in the course of the iteration is certainly excessive, and there is almost certainly a faster way of writing that; I chose this version for its simplicity and (hopefully) its clarity
This is very slow for long words with many similar characters. Slow compared to theoretical maximum performance that is. For example, permutations("mississippi") will produce a much longer list than necessary. It will have a length of 39916800, but but the set has a size of 34650.
>>> len(list(permutations("mississippi")))
39916800
>>> len(set(permutations("mississippi")))
34650
So the big flaw with your method is that you generate ALL anagrams and then remove the duplicates. Use a method that only generates the unique anagrams.
EDIT:
Here is some working, but extremely ugly and possibly buggy code. I'm making it nicer as you're reading this. It does give 34650 for mississippi, so I assume there aren't any major bugs. Warning again. UGLY!
# Returns a dictionary with letter count
# get_letter_list("mississippi") returns
# {'i':4, 'm':1, 'p': 2, 's':4}
def get_letter_list(word):
w = sorted(word)
c = 0
dd = {}
dd[w[0]]=1
for l in range(1,len(w)):
if w[l]==w[l-1]:
d[c]=d[c]+1
dd[w[l]]=dd[w[l]]+1
else:
c=c+1
d.append(1)
dd[w[l]]=1
return dd
def sum_dict(d):
s=0
for x in d:
s=s+d[x]
return s
# Recursively create the anagrams. It takes a letter list
# from the above function as an argument.
def create_anagrams(dd):
if sum_dict(dd)==1: # If there's only one letter left
for l in dd:
return l # Ugly hack, because I'm not used to dics
a = []
for l in dd:
if dd[l] != 0:
newdd=dict(dd)
newdd[l]=newdd[l]-1
if newdd[l]==0:
newdd.pop(l)
newl=create(newdd)
for x in newl:
a.append(str(l)+str(x))
return a
>>> print (len(create_anagrams(get_letter_list("mississippi"))))
34650
It works like this: For every unique letter l, create all unique permutations with one less occurance of the letter l, and then append l to all these permutations.
For "mississippi", this is way faster than set(permutations(word)) and it's far from optimally written. For instance, dictionaries are quite slow and there's probably lots of things to improve in this code, but it shows that the algorithm itself is much faster than your approach.
Maybe I am missing something, but why don't you just do this:
from itertools import permutations
def find_all_anagrams(word):
return sorted(set(permutations(word)))
You could simplify to:
from itertools import permutations
def find_all_anagrams(word):
word = set(''.join(sorted(word)))
return list(permutations(word))
In the doc for permutations the code is detailled and it seems already optimized.
I don't know python but I want to try to help you: probably there are a lot of other more performant algorithm, but I've thought about this one: it's completely recursive and it should cover all the cases of a permutation. I want to start with a basic example:
permutation of ABC
Now, this algorithm works in this way: for Length times you shift right the letters, but the last letter will become the first one (you could easily do this with a queue).
Back to the example, we will have:
ABC
BCA
CAB
Now you repeat the first (and only) step with the substring built from the second letter to the last one.
Unfortunately, with this algorithm you cannot consider permutation with repetition.

How could I use map() in this code?

I have a string list
[str1, str2, str3.....]
and I also have a def to check the format of the strings, something like:
def CheckIP(strN):
if(formatCorrect(strN)):
return True
return False
Now I want to check every string in list, and of course I can use for to check one by one. But could I use map() to make code more readable...?
You can map your list to your function and then use all to check if it returns True for every item:
if all(map(CheckIP, list_of_strings)):
# All strings are good
Actually, it would be cleaner to just get rid of the CheckIP function and use formatCorrect directly:
if all(map(formatCorrect, list_of_strings)):
# All strings are good
Also, as an added bonus, all uses lazy-evaluation. Meaning, it only checks as many items as are necessary before returning a result.
Note however that a more common approach would be to use a generator expression instead of map:
if all(formatCorrect(x) for x in list_of_strings):
In my opinion, generator expressions are always better than map because:
They are slightly more readable.
They are just as fast if not faster than using map. Also, in Python 2.x, map creates a list object that is often unnecessary (wastes memory). Only in Python 3.x does map use lazy-computation like a generator expression.
They are more powerful. In addition to just mapping items to a function, generator expressions allow you to perform operations on each item as they are produced. For example:
sum(x * 2 for x in (1, 2, 3))
They are preferred by most Python programmers. Keeping with convention is important when programming because it eases maintenance and makes your code more understandable.
There is talk of removing functions like map, filter, etc. from a future version of the language. Though this is not set in stone, it has come up many times in the Python community.
Of course, if you are a fan of functional programming, there isn't much chance you'll agree to points one and four. :)
An example, how you could do:
in_str = ['str1', 'str2', 'str3', 'not']
in_str2 = ['str1', 'str2', 'str3']
def CheckIP(strN):
# different than yours, just to show example.
if 'str' in strN:
return True
else:
return False
print(all(map(CheckIP, in_str))) # gives false
print(all(map(CheckIP, in_str2))) # gives true
L = [str1, str2, str3.....]
answer = list(map(CheckIP, L))
answer is a list of booleans such that answer[i] is CheckIP(L[i]). If you want to further check if all of those values are True, you could use all:
all(answer)
This returns True if and only if all the values in answer are True. However, you may do this without listifying:
all(map(CheckIP, L)), as, in python3, `map` returns an iterator, not a list. This way, you don't waste space turning everything into a list. You also save on time, as the first `False` value makes `all` return `False`, stopping `map` from computing any remaining values

Why does len() not support iterators?

Many of Python's built-in functions (any(), all(), sum() to name some) take iterables but why does len() not?
One could always use sum(1 for i in iterable) as an equivalent, but why is it len() does not take iterables in the first place?
Many iterables are defined by generator expressions which don't have a well defined len. Take the following which iterates forever:
def sequence(i=0):
while True:
i+=1
yield i
Basically, to have a well defined length, you need to know the entire object up front. Contrast that to a function like sum. You don't need to know the entire object at once to sum it -- Just take one element at a time and add it to what you've already summed.
Be careful with idioms like sum(1 for i in iterable), often it will just exhaust iterable so you can't use it anymore. Or, it could be slow to get the i'th element if there is a lot of computation involved. It might be worth asking yourself why you need to know the length a-priori. This might give you some insight into what type of data-structure to use (frequently list and tuple work just fine) -- or you may be able to perform your operation without needing calling len.
This is an iterable:
def forever():
while True:
yield 1
Yet, it has no length. If you want to find the length of a finite iterable, the only way to do so, by definition of what an iterable is (something you can repeatedly call to get the next element until you reach the end) is to expand the iterable out fully, e.g.:
len(list(the_iterable))
As mgilson pointed out, you might want to ask yourself - why do you want to know the length of a particular iterable? Feel free to comment and I'll add a specific example.
If you want to keep track of how many elements you have processed, instead of doing:
num_elements = len(the_iterable)
for element in the_iterable:
...
do:
num_elements = 0
for element in the_iterable:
num_elements += 1
...
If you want a memory-efficient way of seeing how many elements end up being in a comprehension, for example:
num_relevant = len(x for x in xrange(100000) if x%14==0)
It wouldn't be efficient to do this (you don't need the whole list):
num_relevant = len([x for x in xrange(100000) if x%14==0])
sum would probably be the most handy way, but it looks quite weird and it isn't immediately clear what you're doing:
num_relevant = sum(1 for _ in (x for x in xrange(100000) if x%14==0))
So, you should probably write your own function:
def exhaustive_len(iterable):
length = 0
for _ in iterable: length += 1
return length
exhaustive_len(x for x in xrange(100000) if x%14==0)
The long name is to help remind you that it does consume the iterable, for example, this won't work as you might think:
def yield_numbers():
yield 1; yield 2; yield 3; yield 5; yield 7
the_nums = yield_numbers()
total_nums = exhaustive_len(the_nums)
for num in the_nums:
print num
because exhaustive_len has already consumed all the elements.
EDIT: Ah in that case you would use exhaustive_len(open("file.txt")), as you have to process all lines in the file one-by-one to see how many there are, and it would be wasteful to store the entire file in memory by calling list.

What possible improvements can be made to a palindrome program?

I am just learning programming in Python for fun. I was writing a palindrome program and I thought of how I can make further improvements to it.
First thing that came to my mind is to prevent the program from having to go through the entire word both ways since we are just checking for a palindrome. Then I realized that the loop can be broken as soon as the first and the last character doesn't match.
I then implemented them in a class so I can just call a word and get back true or false.
This is how the program stands as of now:
class my_str(str):
def is_palindrome(self):
a_string = self.lower()
length = len(self)
for i in range(length/2):
if a_string[i] != a_string[-(i+1)]:
return False
return True
this = my_str(raw_input("Enter a string: "))
print this.is_palindrome()
Are there any other improvements that I can make to make it more efficient?
I think the best way to improvise write a palindrome checking function in Python is as follows:
def is_palindrome(s):
return s == s[::-1]
(Add the lower() call as required.)
What about something way simpler? Why are you creating a class instead of a simple method?
>>> is_palindrome = lambda x: x.lower() == x.lower()[::-1]
>>> is_palindrome("ciao")
False
>>> is_palindrome("otto")
True
While other answers have been given talking about how best to approach the palindrome problem in Python, Let's look at what you are doing.
You are looping through the string by using indices. While this works, it's not very Pythonic. Python for loops are designed to loop over the objects you want, not simply over numbers, as in other languages. This allows you to cut out a layer of indirection and make your code clearer and simpler.
So how can we do this in your case? Well, what you want to do is loop over the characters in one direction, and in the other direction - at the same time. We can do this nicely using the zip() and reversed() builtins. reversed() allows us to get the characters in the reversed direction, while zip() allows us to iterate over two iterators at once.
>>> a_string = "something"
>>> for first, second in zip(a_string, reversed(a_string)):
... print(first, second)
...
s g
o n
m i
e h
t t
h e
i m
n o
g s
This is the better way to loop over the characters in both directions at once. Naturally this isn't the most effective way of solving this problem, but it's a good example of how you should approach things differently in Python.
Building on Lattyware's answer - by using the Python builtins appropriately, you can avoid doing things like a_string[-(i+1)], which takes a second to understand - and, when writing more complex things than palindromes, is prone to off-by-one errors.
The trick is to tell Python what you mean to do, rather than how to achieve it - so, the most obvious way, per another answer, is to do one of the following:
s == s[::-1]
list(s) == list(reversed(s))
s == ''.join(reversed(s))
Or various other similar things. All of them say: "is the string equal to the string backwards?".
If for some reason you really do need the optimisation that you know you've got a palindrome once you're halfway (you usually shouldn't, but maybe you're dealing with extremely long strings), you can still do better than index arithmetic. You can start with:
halfway = len(s) // 2
(where // forces the result to an integer, even in Py3 or if you've done from __future__ import division). This leads to:
s[:halfway] == ''.join(reversed(s[halfway:]))
This will work for all even-length s, but fail for odd length s because the RHS will be one element longer. But, you don't care about the last character in it, since it is the middle character of the string - which doesn't affect its palindromeness. If you zip the two together, it will stop after the end of the short one - and you can compare a character at a time like in your original loop:
for f,b in zip(s[:half], reversed(s[half:])):
if f != b:
return False
return True
And you don't even need ''.join or list or such. But you can still do better - this kind of loop is so idiomatic that Python has a builtin function called all just to do it for you:
all(f == b for f,b in zip(s[:half], reversed(s[half:])))
Says 'all the characters in the front half of the list are the same as the ones in the back half of the list written backwards'.
One improvement I can see would be to use xrange instead of range.
It probably isn't a faster implementation but you could use a recursive test. Since you're learning, this construction is very useful in many situation :
def is_palindrome(word):
if len(word) < 2:
return True
if word[0] != word[-1]:
return False
return is_palindrome(word[1:-1])
Since this is a rather simple (light) function this construction might not be the fastest because of the overhead of calling the function multiple times, but in other cases where the computation are more intensive it can be a very effective construction.
Just my two cents.

Check if all values of iterable are zero

Is there a good, succinct/built-in way to see if all the values in an iterable are zeros? Right now I am using all() with a little list comprehension, but (to me) it seems like there should be a more expressive method. I'd view this as somewhat equivalent to a memcmp() in C.
values = (0, 0, 0, 0, 0)
# Test if all items in values tuple are zero
if all([ v == 0 for v in values ]) :
print 'indeed they are'
I would expect a built-in function that does something like:
def allcmp(iter, value) :
for item in iter :
if item != value :
return False
return True
Does that function exist in python and I'm just blind, or should I just stick with my original version?
Update
I'm not suggesting that allcmp() is the solution. It is an example of what I think might be more meaningful. This isn't the place where I would suggest new built-ins for Python.
In my opinion, all() isn't that meaningful. It doesn't express what "all" is checking for. You could assume that all() takes an iterable, but it doesn't express what the function is looking for (an iterable of bools that tests all of them for True). What I'm asking for is some function like my allcmp() that takes two parameters: an iterable and a comparison value. I'm asking if there is a built-in function that does something similar to my made up allcmp().
I called mine allcmp() because of my C background and memcmp(), the name of my made up function is irrelevant here.
Use generators rather than lists in cases like that:
all(v == 0 for v in values)
Edit:
all is standard Python built-in. If you want to be efficient Python programmer you should know probably more than half of them (http://docs.python.org/library/functions.html). Arguing that alltrue is better name than all is like arguing that C while should be call whiletrue. Is subjective, but i think that most of the people prefer shorter names for built-ins. This is because you should know what they do anyway, and you have to type them a lot.
Using generators is better than using numpy because generators have more elegant syntax. numpy may be faster, but you will benefit only in rare cases (generators like showed are fast, you will benefit only if this code is bottleneck in your program).
You probably can't expect nothing more descriptive from Python.
PS. Here is code if you do this in memcpm style (I like all version more, but maybe you will like this one):
list(l) == [0] * len(l)
If you know that the iterable will contain only integers then you can just do this:
if not any(values):
# etc...
If values is a numpy array you can write
import numpy as np
values = np.array((0, 0, 0, 0, 0))
all(values == 0)
The any() function may be the most simple and easy way to achieve just that. If the iterable is empty,e.g. all elements are zero, it will return False.
values = (0, 0, 0, 0, 0)
print (any(values)) # return False
The built-in set is given an iterable and returns a collection (set) of unique values.
So it can be used here as:
set(it) == {0}
assuming it is the iterable
{0} is a set containing only zero
More info on python set-types-set-frozenset here in docs.
I prefer using negation:
all(not v for v in values)

Categories

Resources