Is one faster than another? func() or module.func()

Is one faster than another? func() or module.func() - python

I see that from x import * is discouraged all over the place. Corrupts naming space, etc.
So I'm inclined to use from . import x, and when I need to use the functions, I'll call x.func() instead of just using func().
The speed difference is probably very little, but I still want to know how much it might impact the performance? So that I can keep the good habit without needing to worry about other things.

It has practically no impact:
>>> import timeit
>>> timeit.timeit('math.pow(1, 1)', 'import math')
0.20310196322982677
>>> timeit.timeit('pow(1, 1)', 'from math import pow')
0.19039931574786806
Note I picked a function that would have very little run time so that any difference would be magnified.

Related

My python codes in general are very slow, is this normal?

I recently began self-learning python, and have been using this language for an online course in algorithms. For some reason, many of my codes I created for this course are very slow (relatively to C/C++ Matlab codes I have created in the past), and I'm starting to worry that I am not using python properly.
Here is a simple python and matlab code to compare their speed.
MATLAB
for i = 1:100000000
a = 1 + 1
end
Python
for i in list(range(0, 100000000)):
a=1 + 1
The matlab code takes about 0.3 second, and the python code takes about 7 seconds. Is this normal? My python codes for much complex problems are very slow. For example, as a HW assignment, I'm running depth first search on a graph with about 900000 nodes, and this is taking forever. Thank you.

Performance is not an explicit design goal of Python:
Don’t fret too much about performance--plan to optimize later when
needed.
That's one of the reasons why Python integrated with a lot of high performance calculating backend engines, such as numpy, OpenBLAS and even CUDA, just to name a few.
The best way to go foreward if you want to increase performance is to let high-performance libraries do the heavy lifting for you. Optimizing loops within Python (by using xrange instead of range in Python 2.7) won't get you very dramatic results.
Here is a bit of code that compares different approaches:
Your original list(range())
The suggestes use of xrange()
Leaving the i out
Using numpy to do the addition using numpy array's (vector addition)
Using CUDA to do vector addition on the GPU
Code:
import timeit
import matplotlib.pyplot as mplplt
iter = 100
testcode = [
"for i in list(range(1000000)): a = 1+1",
"for i in xrange(1000000): a = 1+1",
"for _ in xrange(1000000): a = 1+1",
"import numpy; one = numpy.ones(1000000); a = one+one",
"import pycuda.gpuarray as gpuarray; import pycuda.driver as cuda; import pycuda.autoinit; import numpy;" \
"one_gpu = gpuarray.GPUArray((1000000),numpy.int16); one_gpu.fill(1); a = (one_gpu+one_gpu).get()"
]
labels = ["list(range())", "i in xrange()", "_ in xrange()", "numpy", "numpy and CUDA"]
timings = [timeit.timeit(t, number=iter) for t in testcode]
print labels, timings
label_idx = range(len(labels))
mplplt.bar(label_idx, timings)
mplplt.xticks(label_idx, labels)
mplplt.ylabel('Execution time (sec)')
mplplt.title('Timing of integer addition in python 2.7\n(smaller value is better performance)')
mplplt.show()
Results (graph) ran on Python 2.7.13 on OSX:
The reason that Numpy performs faster than the CUDA solution is that the overhead of using CUDA does not beat the efficiency of Python+Numpy. For larger, floating point calculations, CUDA does even better than Numpy.
Note that the Numpy solution performs more that 80 times faster than your original solution. If your timings are correct, this would even be faster than Matlab...
A final note on DFS (Depth-afirst-Search): here is an interesting article on DFS in Python.

Try using xrange instead of range.
The difference between them is that **xrange** generates the values as you use them instead of range, which tries to generate a static list at runtime.

Unfortunately, python's amazing flexibility and ease comes at the cost of being slow. And also, for such large values of iteration, I suggest using itertools module as it has faster caching.
The xrange is a good solution however if you want to iterate over dictionaries and such, it's better to use itertools as in that, you can iterate over any type of sequence object.

Timeit module - Passing objects to setup?

I am trying to use the timeit module to time the speed of an algorithm that analyzes data.
The problem is that I have to do run some setup code in order to run this algorithm. Specifically, I have to load some documents from a database, and turn it into matrix representation.
The timeit module does not seem to allow me to pass in the matrix object, and instead forces me to set this up all over again in the setup parameter. Unfortunately this means that the running time of my algorithm is fuzzed by the running time of pre-processing.
Is there some way to pass in objects that were created already, to timeit in the setup parameter? Otherwise, how can I deal with situations where the setup code takes a nontrivial amount of time, and I don't want that to fuzz the code block I actually am trying to test?
Am I approaching this the wrong way?

Running time of your algorithm is not fuzzed by the running time of pre-processing. This can be proved as: Suppose I declare a list in __main__ module and run timeit to find index of some item in this list. But I need to pass the list to timeit too. The list passing is sort of pre-processing. Time returned by timeit shows 0.26 sec (see below code). Now if timeit would have calculated the pre-processing time (importing list from __main__) too, then the result would have been almost 1.1 sec, because importing list from __main__ requires 0.84 sec for 1000000 iterations (see below code). What timeit does is it imports list from __main__ only once and then calculates time required by the algorithm for given number of iterations.
>>> import timeit
>>> lst = range(10)
>>> timeit.timeit('lst.index(9)', 'from __main__ import lst', number = 1000000)
0.2645089626312256
>>> timeit.timeit('from __main__ import lst', number = 1000000)
0.8406829833984375

The time it takes to run the setup code doesn't affect the timeit module's timing calculations.
You should be able to pass your matrix into the setup parameter using import, eg
"from __main__ import mymatrix"

Simplify statement '.'.join( string.split('.')[0:3] )

I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:
version = '1.2.3.4.5-RC4' # the end can vary a lot
api = '.'.join( version.split('.')[0:3] ) # extract '1.2.3'
Therefore I wonder:
Will this line be executed (interpreted) as creation of a temporary array (memory allocation), then concatenate the first three cells (again memory allocation)?
Or is the python interpreter smart enough?
(I am also curious about optimizations made in this context by Pythran, Parakeet, Numba, Cython, and other python interpreters/compilers...)
Is there a trick to write a replacement line more CPU efficient and still understandable/elegant?
(You can provide specific Python2 and/or Python3 tricks and tips)

I have no idea of the CPU usage, for this purpose, but isn't it why we use high level languages in some way?
Another solution would be using regular expressions, using compiled pattern should allow background optimisations:
import re
version = '1.2.3.4.5-RC4'
pat = re.compile('^(\d+\.\d+\.\d+)')
res = re.match(version)
if res:
print res.group(1)
Edit: As suggested #jonrsharpe, I did also run the timeit benchmark. Here are my results:
def extract_vers(str):
res = pat.match(str)
if res:
return res.group(1)
else:
return False
>>> timeit.timeit("api1(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.9013631343841553
>>> timeit.timeit("api2(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.3482811450958252
>>> timeit.timeit("extract_vers(s)", setup="from __main__ import extract_vers,api1,api2; s='1.2.3.4.5-RC4'")
1.174590826034546
Edit: But anyway, some lib exist in Python, such as distutils.version to do the job.
You should have a look on that answer.

To answer your first question: no, this will not be optimised out by the interpreter. Python will create a list from the string, then create a second list for the slice, then put the list items back together into a new string.
To cover the second, you can optimise this slightly by limiting the split with the optional maxsplit argument:
>>> v = '1.2.3.4.5-RC4'
>>> v.split(".", 3)
['1', '2', '3', '4.5-RC4']
Once the third '.' is found, Python stops searching through the string. You can also neaten slightly by removing the default 0 argument to the slice:
api = '.'.join(version.split('.', 3)[:3])
Note, however, that any difference in performance is negligible:
>>> import timeit
>>> def test1(version):
return '.'.join(version.split('.')[0:3])
>>> def test2(version):
return '.'.join(version.split('.', 3)[:3])
>>> timeit.timeit("test1(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0458565345561743
>>> timeit.timeit("test2(s)", setup="from __main__ import test1, test2; s = '1.2.3.4.5-RC4'")
1.0842980287537776
The benefit of maxsplit becomes clearer with longer strings containing more irrelevant '.'s:
>>> timeit.timeit("s.split('.')", setup="s='1.'*100")
3.460900054011617
>>> timeit.timeit("s.split('.', 3)", setup="s='1.'*100")
0.5287887450379003

I am used to code in C/C++ and when I see the following array operation, I feel some CPU wasting:
A feel of CPU wasting is absolutely normal for C/C++ programmers facing python code. Your code:
version = '1.2.3.4.5-RC4' # the end can vary a lot
api = '.'.join(version.split('.')[0:3]) # extract '1.2.3'
Is absolutely fine in python, there is no simplification possible. Only if you have to do it 1000s of times, consider using a library function or write your own.

Using slots under PyPy

I have this simple code that helped me to measure how classes with __slots__ perform (taken from here):
import timeit
def test_slots():
class Obj(object):
__slots__ = ('i', 'l')
def __init__(self, i):
self.i = i
self.l = []
for i in xrange(1000):
Obj(i)
print timeit.Timer('test_slots()', 'from __main__ import test_slots').timeit(10000)
If I run it via python2.7 - I would get something around 6 seconds - ok, it's really faster (and also more memory-efficient) than without slots.
But, if I run the code under PyPy (using 2.2.1 - 64bit for Mac OS/X), it starts to use 100% CPU and "never" returns (waited for minutes - no result).
What is going on? Should I use __slots__ under PyPy?
Here's what happens if I pass different number to timeit():
timeit(10) - 0.067s
timeit(100) - 0.5s
timeit(1000) - 19.5s
timeit(10000) - ? (probably more than a Game of Thrones episode)
Thanks in advance.
Note that the same behavior is observed if I use namedtuples:
import collections
import timeit
def test_namedtuples():
Obj = collections.namedtuple('Obj', 'i l')
for i in xrange(1000):
Obj(i, [])
print timeit.Timer('test_namedtuples()', 'from __main__ import test_namedtuples').timeit(10000)

In each of the 10,000 or so iterations of the timeit code, the class is recreated from scratch. Creating classes is probably not a well-optimized operation in PyPy; even worse, doing so will probably discard all of the optimizations that the JIT learned about the previous incarnation of the class. PyPy tends to be slow until the JIT has warmed up, so doing things that require it to warm up repeatedly will kill your performance.
The solution here is, of course, to simply move the class definition outside of the code being benchmarked.

To directly answer the question in the title: __slots__ is pointless for (but doesn't hurt) performance in PyPy.

Python: why pickle?

I have been using pickle and was very happy, then I saw this article: Don't Pickle Your Data
Reading further it seems like:
Pickle is slow
Pickle is unsafe
Pickle isn’t human readable
Pickle isn’t language-agnostic
I’ve switched to saving my data as JSON, but I wanted to know about best practice:
Given all these issues, when would you ever use pickle? What specific situations call for using it?

Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions. However, this is also gives it the power to serialize almost any Python object, without any boilerplate or even white-/black-listing (in the common case). That's very desirable for some use cases:
Quick & easy serialization, for example for pausing and resuming a long-running but simple script. None of the concerns matter here, you just want to dump the program's state as-is and load it later.
Sending arbitrary Python data to other processes or computers, as in multiprocessing. The security concerns may apply (but mostly don't), the generality is absolutely necessary, and humans won't have to read it.
In other cases, none of the drawbacks is quite enough to justify the work of mapping your stuff to JSON or another restrictive data model. Maybe you don't expect to need human readability/safety/cross-language compatibility or maybe you can do without. Remember, You Ain't Gonna Need It. Using JSON would be the right thing™ but right doesn't always equal good.
You'll notice that I completely ignored the "slow" downside. That's because it's partially misleading: Pickle is indeed slower for data that fits the JSON model (strings, numbers, arrays, maps) perfectly, but if your data's like that you should use JSON for other reasons anyway. If your data isn't like that (very likely), you also need to take into account the custom code you'll need to turn your objects into JSON data, and the custom code you'll need to turn JSON data back into your objects. It adds both engineering effort and run-time overhead, which must be quantified on a case-by-case basis.

Pickle has the advantage of convenience -- it can serialize arbitrary object graphs with no extra work, and works on a pretty broad range of Python types. With that said, it would be unusual for me to use Pickle in new code. JSON is just a lot cleaner to work with.

I usually use neither Pickle, nor JSON, but MessagePack it is both safe and fast, and produces serialized data of small size.
An additional advantage is possibility to exchange data with software written in other languages (which of course is also true in case of JSON).

I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method.
import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np
num_tests = 10
obj = np.random.normal(0.5, 1, [240, 320, 3])
command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle: %f seconds" % result)
command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle: %f seconds" % result)
command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest: %f seconds" % result)
command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json: %f seconds" % result)
command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack: %f seconds" % result)
Output:
pickle : 0.847938 seconds
cPickle : 0.810384 seconds
cPickle highest: 0.004283 seconds
json : 1.769215 seconds
msgpack : 0.270886 seconds
So, I prefer cPickle with the highest dumping protocol in situations that require real time performance such as video streaming from a camera to
a server.

You can find some answer on JSON vs. Pickle security: JSON can only pickle unicode, int, float, NoneType, bool, list and dict. You can't use it if you want to pickle more advanced objects such as classes instance. Note that for those kinds of pickle, there is no hope to be language agnostic.
Also using cPickle instead of Pickle partially resolve the speed progress.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.