Python if vs try-except - python

I was wondering why the try-except is slower than the if in the program below.
def tryway():
try:
while True:
alist.pop()
except IndexError:
pass
def ifway():
while True:
if alist == []:
break
else:
alist.pop()
if __name__=='__main__':
from timeit import Timer
alist = range(1000)
print "Testing Try"
tr = Timer("tryway()","from __main__ import tryway")
print tr.timeit()
print "Testing If"
ir = Timer("ifway()","from __main__ import ifway")
print ir.timeit()
The results I get are interesting.
Testing Try
2.91111302376
Testing If
0.30621099472
Can anyone shed some light why the try is so much slower?

You're setting alist only once. The first call to "tryway" clears it, then every successive call does nothing.
def tryway():
alist = range(1000)
try:
while True:
alist.pop()
except IndexError:
pass
def ifway():
alist = range(1000)
while True:
if alist == []:
break
else:
alist.pop()
if __name__=='__main__':
from timeit import Timer
print "Testing Try"
tr = Timer("tryway()","from __main__ import tryway")
print tr.timeit(10000)
print "Testing If"
ir = Timer("ifway()","from __main__ import ifway")
print ir.timeit(10000)
>>> Testing Try
>>> 2.09539294243
>>> Testing If
>>> 2.84440898895

Exception handling is generally slow in most languages. Most compilers, interpreters and VMs (that support exception handling) treat exceptions (the language idiom) as exceptions (uncommon). Performance optimization involves trade-offs and making exceptions fast would typically mean other areas of the language would suffer (either in performance or simplicity of design).
At a more technical level, exceptions generally mean that the VM/interpretter (or the runtime execution library) has to save a bunch of state and begin pulling off all the state on the function call stack (called unwinding) up until the point where a valid catch (except) is found.
Or looking at it from a different viewpoint, the program stops running when an exception occurs and a "debugger" takes over. This debugger searches back through the stack (calling function data) for a catch that matches the exception. If it finds one, it cleans things up and returns control to the program at that point. If it doesn't find one then it returns control to the user (perhaps in the form of an interactive debugger or python REPL).

If you are really interested in speed, both of your contestants could do with losing some weight.
while True: is slower than while 1: -- True is a global "variable" which is loaded and tested; 1 is a constant and the compiler does the test and emits an unconditional jump.
while True: is redundant in ifway. Fold the while/if/break together: while alist != []:
while alist != []: is a slow way of writing while alist:
Try this:
def tryway2():
alist = range(1000)
try:
while 1:
alist.pop()
except IndexError:
pass
def ifway2():
alist = range(1000)
while alist:
alist.pop()
`

There is still faster way iterating with for, though sometimes we want list to physically shirink so we know how many are left. Then alist should be parameter to the generator. (John is also right for while alist:) I put the function to be a generator and used list(ifway()) etc. so the values are actualy used out of function (even not used):
def tryway():
alist = range(1000)
try:
while True:
yield alist.pop()
except IndexError:
pass
def whileway():
alist = range(1000)
while alist:
yield alist.pop()
def forway():
alist = range(1000)
for item in alist:
yield item
if __name__=='__main__':
from timeit import Timer
print "Testing Try"
tr = Timer("list(tryway())","from __main__ import tryway")
print tr.timeit(10000)
print "Testing while"
ir = Timer("list(whileway())","from __main__ import whileway")
print ir.timeit(10000)
print "Testing for"
ir = Timer("list(forway())","from __main__ import forway")
print ir.timeit(10000)
J:\test>speedtest4.py
Testing Try
6.52174983133
Testing while
5.08004508953
Testing for
2.14167694497

Not sure but I think it's something like this: the while true follow the normal instruction line which means the processor can pipeline and do all sorts of nice things. Exceptions jump straight through all that so the VM need to handle it specially, and that takes time.

defensive programming requires that one test for conditions which are rare and/or abnormal, some of which during the course of a year or many years will not occur, thus in these circumstances perhaps try-except may be justified.

Just thought to toss this into the mix:
I tried the following script below which seems to suggest that handling an exception is slower than handling an else statement:
import time
n = 10000000
l = range(0, n)
t0 = time.time()
for i in l:
try:
i[0]
except:
pass
t1 = time.time()
for i in l:
if type(i) == list():
print(i)
else:
pass
t2 = time.time()
print(t1-t0)
print(t2-t1)
gives:
5.5908801555633545
3.512694835662842
So, (even though I know someone will likely comment upon the use of time rather than timeit), there appears to be a ~60% slow down using try/except in loops. So, perhaps better to go with if/else when going through a for loop of several billion items.

Related

How to solve StopIteration error in Python?

I have just read a bunch of posts on how to handle the StopIteration error in Python, I had trouble solving my particular example.I just want to print out from 1 to 20 with my code but it prints out error StopIteration. My code is:(I am a completely newbie here so please don't block me.)
def simpleGeneratorFun(n):
while n<20:
yield (n)
n=n+1
# return [1,2,3]
x = simpleGeneratorFun(1)
while x.__next__() <20:
print(x.__next__())
if x.__next__()==10:
break
Any time you use x.__next__() it gets the next yielded number - you do not check every one yielded and 10 is skipped - so it continues to run after 20 and breaks.
Fix:
def simpleGeneratorFun(n):
while n<20:
yield (n)
n=n+1
# return [1,2,3]
x = simpleGeneratorFun(1)
while True:
try:
val = next(x) # x.__next__() is "private", see #Aran-Frey comment
print(val)
if val == 10:
break
except StopIteration as e:
print(e)
break
First, in each loop iteration, you're advancing the iterator 3 times by making 3 separate calls to __next__(), so the if x.__next__()==10 might never be hit since the 10th element might have been consumed earlier. Same with missing your while condition.
Second, there are usually better patterns in python where you don't need to make calls to next directly. For example, if you have finite iterator, use a for loop to automatically break on StopIteration:
x = simpleGeneratorFun(1)
for i in x:
print i

Stop iteration from inside a function

I am using an iterator to flexiblily go through a collection. In my function there are several cases in which the function gets a new item and processes them. So there are several cases in which something like this happens:
it = iter(range(10))
while condition:
try:
item = next(it)
item.do_many_different_things()
except StopIteration:
break
And that makes everything extremly messy, so I wanted to move it into a seperate methode. But than I can't use break, because python doesn't know what loop it should break. So far I'm returning a None type, and break the loop if a None was returned. But is there a more elegant solution?
You can return value from the do_many_different_things() function and change the condition variable accordingly, so it breaks out of the while loop as needed.
def func(it):
item = next(it)
res = item.do_many_different_things()
yield res
it = iter(range(1, 10))
condition = True
while condition:
for item in func(it):
condition = item
This will run all elements from 1..9, because they are all truthy. If you start this with the regular range(10) it would stop on the first element since it's 0.
Once the method return False, the while loop breaks.
I don't know what your code, nested loops and items are, so I will just show you how to break out from nested loops spread across different functions. I will also show you, how you can differentiate three cases:
1. Your item.do_many_different_things() method wants to break
2. You ran out of items
3. Your condition evaluates to False
This is purely educational to show you some Python features which you might find useful, not necessarily in this exact combination.
from __future__ import print_function
# I'm on Python 3 - you will need the above line on Python 2
# I don't know what your code is supposed to do so I'll just generate random integers
from random import Random
r = Random()
r.seed()
class BreakOutNested(Exception): pass
class Item(object):
def do_many_different_things(self):
x = r.randint(0, 50)
if x == 50:
raise BreakOutNested()
self.x = x
def iterator_0(item):
for i in range(5):
item.do_many_different_things()
yield i
def iterator_1(items):
for item in items:
for i in iterator_0(item):
item.i = i
yield item
items = iterator_1(Item() for i in range(5))
x = 50
try:
while x != 0:
item = next(items)
print(item.i, item.x)
x = item.x
except BreakOutNested:
print('Broke out from many loops with exception trick')
except StopIteration:
print('Simply ran out of items')
else:
print('Got x == 0')
Run this a couple of times as the exit scenario is random.

Slow processing time of a list

Why is my code so sluggish (inefficient)? I need to make two methods to record the time it takes to process a list of a given size. I have a search_fast and search_slow method. Even though there is a difference between those two search times. Search_fast is still pretty slow. I'd like to optimise the processing time so instead of getting 8.99038815498 with search_fast and 65.0739619732 with search_slow. It would only take a fraction of a second. What can I do? I'd be eternally grateful for some tips as coding is still pretty new to me. :)
from timeit import Timer
def fillList(l, n):
l.extend(range(1, n + 1))
l = []
fillList(l, 100)
def search_fast(l):
for item in l:
if item == 10:
return True
return False
def search_slow(l):
return_value = False
for item in l:
if item == 10:
return_value = True
return return_value
t = Timer(lambda: search_fast(l))
print t.timeit()
t = Timer(lambda: search_slow(l))
print t.timeit()
The fastest way is using in operator, which tests membership of a value in a sequence.
if value in some_container:
…
Reference: https://docs.python.org/3/reference/expressions.html#membership-test-operations
Update: also, if you frequently need to test the membership, consider using sets instead of lists.
Some pros and cons can be found here: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
Adding the following code to above:
t = Timer(lambda: 10 in l)
print(t.timeit())
produces the following on my system:
0.6166538814701169
3.884095008084452
0.29087270299795875
>>>
Hope this helps. The basic idea is to tap into underlying C code and not make your own Python code.
I managed to find out what made the code sluggish. It was a simple mistake of adding to the list byextend instead of append.
def fillList(l, n):
l.**append**(range(1, n + 1))
l = []
fillList(l, 100)
Now search_slowclocks in at 3.91826605797 instead of 65.0739619732. But I have no idea why it changes the performance so much.

How to loop through a generator

How can one loop through a generator? I thought about this way:
gen = function_that_returns_a_generator(param1, param2)
if gen: # in case the generator is null
while True:
try:
print gen.next()
except StopIteration:
break
Is there a more pythonic way?
Simply
for x in gen:
# whatever
will do the trick. Note that if gen always returns True.
for item in function_that_returns_a_generator(param1, param2):
print item
You don't need to worry about the test to see if there is anything being returned by your function as if there's nothing returned you won't enter the loop.
In case you don't need the output of the generator because you care only about its side effects, you can use the following one-liner:
for _ in gen: pass
Follow up
Following the comment by aiven I made some performance tests, and while it seems that list(gen) is slightly faster than for _ in gen: pass, it comes out that tuple(gen) is even faster. However, as Erik Aronesty correctly points out, tuple(gen) and list(gen) store the results, so my final advice is to use
tuple(gen)
but only if the generator is not going to loop billions of times soaking up too much memory.
You can simply loop through it:
>>> gen = (i for i in range(1, 4))
>>> for i in gen: print i
1
2
3
But be aware, that you can only loop one time. Next time generator will be empty:
>>> for i in gen: print i
>>>
The other answers are good for complicated scenarios. If you simply want to stream the items into a list:
x = list(generator)
For simple preprocessing, use list comprehensions:
x = [tup[0] for tup in generator]
If you just want to execute the generator without saving the results, you can skip variable assignment:
# no var assignment b/c we don't need what print() returns
[print(_) for _ in gen]
Don't do this if your generator is infinite (say, streaming items from the internet). The list construction is a blocking op that won't stop until the generator is empty.
Just treat it like any other iterable:
for val in function_that_returns_a_generator(p1, p2):
print val
Note that if gen: will always be True, so it's a false test
If you want to manually move through the generator (i.e., to work with each loop manually) then you could do something like this:
from pdb import set_trace
for x in gen:
set_trace()
#do whatever you want with x at the command prompt
#use pdb commands to step through each loop of the generator e.g., >>c #continue

How to retrieve an element from a set without removing it?

Suppose the following:
>>> s = set([1, 2, 3])
How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it - something I can only be sure of after an asynchronous call to another host.
Quick and dirty:
>>> elem = s.pop()
>>> s.add(elem)
But do you know of a better way? Ideally in constant time.
Two options that don't require copying the whole set:
for e in s:
break
# e is now an element from s
Or...
e = next(iter(s))
But in general, sets don't support indexing or slicing.
Least code would be:
>>> s = set([1, 2, 3])
>>> list(s)[0]
1
Obviously this would create a new list which contains each member of the set, so not great if your set is very large.
I wondered how the functions will perform for different sets, so I did a benchmark:
from random import sample
def ForLoop(s):
for e in s:
break
return e
def IterNext(s):
return next(iter(s))
def ListIndex(s):
return list(s)[0]
def PopAdd(s):
e = s.pop()
s.add(e)
return e
def RandomSample(s):
return sample(s, 1)
def SetUnpacking(s):
e, *_ = s
return e
from simple_benchmark import benchmark
b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
{2**i: set(range(2**i)) for i in range(1, 20)},
argument_name='set size',
function_aliases={first: 'First'})
b.plot()
This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.
However as long as one of the constant time approaches is used the performance difference will be negligible.
iteration_utilities (Disclaimer: I'm the author) contains a convenience function for this use-case: first:
>>> from iteration_utilities import first
>>> first({1,2,3,4})
1
I also included it in the benchmark above. It can compete with the other two "fast" solutions but the difference isn't much either way.
tl;dr
for first_item in muh_set: break remains the optimal approach in Python 3.x. Curse you, Guido.
y u do this
Welcome to yet another set of Python 3.x timings, extrapolated from wr.'s excellent Python 2.x-specific response. Unlike AChampion's equally helpful Python 3.x-specific response, the timings below also time outlier solutions suggested above – including:
list(s)[0], John's novel sequence-based solution.
random.sample(s, 1), dF.'s eclectic RNG-based solution.
Code Snippets for Great Joy
Turn on, tune in, time it:
from timeit import Timer
stats = [
"for i in range(1000): \n\tfor x in s: \n\t\tbreak",
"for i in range(1000): next(iter(s))",
"for i in range(1000): s.add(s.pop())",
"for i in range(1000): list(s)[0]",
"for i in range(1000): random.sample(s, 1)",
]
for stat in stats:
t = Timer(stat, setup="import random\ns=set(range(100))")
try:
print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
except:
t.print_exc()
Quickly Obsoleted Timeless Timings
Behold! Ordered by fastest to slowest snippets:
$ ./test_get.py
Time for for i in range(1000):
for x in s:
break: 0.249871
Time for for i in range(1000): next(iter(s)): 0.526266
Time for for i in range(1000): s.add(s.pop()): 0.658832
Time for for i in range(1000): list(s)[0]: 4.117106
Time for for i in range(1000): random.sample(s, 1): 21.851104
Faceplants for the Whole Family
Unsurprisingly, manual iteration remains at least twice as fast as the next fastest solution. Although the gap has decreased from the Bad Old Python 2.x days (in which manual iteration was at least four times as fast), it disappoints the PEP 20 zealot in me that the most verbose solution is the best. At least converting a set into a list just to extract the first element of the set is as horrible as expected. Thank Guido, may his light continue to guide us.
Surprisingly, the RNG-based solution is absolutely horrible. List conversion is bad, but random really takes the awful-sauce cake. So much for the Random Number God.
I just wish the amorphous They would PEP up a set.get_first() method for us already. If you're reading this, They: "Please. Do something."
To provide some timing figures behind the different approaches, consider the following code.
The get() is my custom addition to Python's setobject.c, being just a pop() without removing the element.
from timeit import *
stats = ["for i in xrange(1000): iter(s).next() ",
"for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
"for i in xrange(1000): s.add(s.pop()) ",
"for i in xrange(1000): s.get() "]
for stat in stats:
t = Timer(stat, setup="s=set(range(100))")
try:
print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
except:
t.print_exc()
The output is:
$ ./test_get.py
Time for for i in xrange(1000): iter(s).next() : 0.433080
Time for for i in xrange(1000):
for x in s:
break: 0.148695
Time for for i in xrange(1000): s.add(s.pop()) : 0.317418
Time for for i in xrange(1000): s.get() : 0.146673
This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).
Since you want a random element, this will also work:
>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]
The documentation doesn't seem to mention performance of random.sample. From a really quick empirical test with a huge list and a huge set, it seems to be constant time for a list but not for the set. Also, iteration over a set isn't random; the order is undefined but predictable:
>>> list(set(range(10))) == range(10)
True
If randomness is important and you need a bunch of elements in constant time (large sets), I'd use random.sample and convert to a list first:
>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time
Yet another way in Python 3:
next(iter(s))
or
s.__iter__().__next__()
Seemingly the most compact (6 symbols) though very slow way to get a set element (made possible by PEP 3132):
e,*_=s
With Python 3.5+ you can also use this 7-symbol expression (thanks to PEP 448):
[*s][0]
Both options are roughly 1000 times slower on my machine than the for-loop method.
I use a utility function I wrote. Its name is somewhat misleading because it kind of implies it might be a random item or something like that.
def anyitem(iterable):
try:
return iter(iterable).next()
except StopIteration:
return None
Following #wr. post, I get similar results (for Python3.5)
from timeit import *
stats = ["for i in range(1000): next(iter(s))",
"for i in range(1000): \n\tfor x in s: \n\t\tbreak",
"for i in range(1000): s.add(s.pop())"]
for stat in stats:
t = Timer(stat, setup="s=set(range(100000))")
try:
print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
except:
t.print_exc()
Output:
Time for for i in range(1000): next(iter(s)): 0.205888
Time for for i in range(1000):
for x in s:
break: 0.083397
Time for for i in range(1000): s.add(s.pop()): 0.226570
However, when changing the underlying set (e.g. call to remove()) things go badly for the iterable examples (for, iter):
from timeit import *
stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
"while s:\n\tfor x in s: break\n\ts.remove(x)",
"while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]
for stat in stats:
t = Timer(stat, setup="s=set(range(100000))")
try:
print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
except:
t.print_exc()
Results in:
Time for while s:
a = next(iter(s))
s.remove(a): 2.938494
Time for while s:
for x in s: break
s.remove(x): 2.728367
Time for while s:
x=s.pop()
s.add(x)
s.remove(x): 0.030272
What I usually do for small collections is to create kind of parser/converter method like this
def convertSetToList(setName):
return list(setName)
Then I can use the new list and access by index number
userFields = convertSetToList(user)
name = request.json[userFields[0]]
As a list you will have all the other methods that you may need to work with
You can unpack the values to access the elements:
s = set([1, 2, 3])
v1, v2, v3 = s
print(v1,v2,v3)
#1 2 3
I f you want just the first element try this:
b = (a-set()).pop()
How about s.copy().pop()? I haven't timed it, but it should work and it's simple. It works best for small sets however, as it copies the whole set.
Another option is to use a dictionary with values you don't care about. E.g.,
poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...
You can treat the keys as a set except that they're just an array:
keys = poor_man_set.keys()
print "Some key = %s" % keys[0]
A side effect of this choice is that your code will be backwards compatible with older, pre-set versions of Python. It's maybe not the best answer but it's another option.
Edit: You can even do something like this to hide the fact that you used a dict instead of an array or set:
poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()

Categories

Resources