I wanted to see which is faster:
import numpy as np
np.sqrt(4)
-or-
from numpy import sqrt
sqrt(4)
Here is the code I used to find the average time to run each.
def main():
import gen_funs as gf
from time import perf_counter_ns
t = 0
N = 40
for j in range(N):
tic = perf_counter_ns()
for i in range(100000):
imp2() # I ran the code with this then with imp1()
toc = perf_counter_ns()
t += (toc - tic)
t /= N
time = gf.ns2hms(t) # Converts ns to readable object
print("Ave. time to run: {:d}h {:d}m {:d}s {:d}ms" .
format(time.hours, time.minutes, time.seconds, time.milliseconds))
def imp1():
import numpy as np
np.sqrt(4)
return
def imp2():
from numpy import sqrt
sqrt(4)
return
if __name__ == "__main__":
main()
When I import numpy as np then call np.sqrt(4), I get an average time of about 229ms (time to run the loop 10**4 times).
When I run from numpy import sqrt then call sqrt(4), I get an average time of about 332ms.
Since there is such a difference in time to run, what is the benefit to running from numpy import sqrt? Is there a memory benefit or some other reason why I would do this?
I tried timing with the time bash command. I got 215ms for importing numpy and running sqrt(4) and 193ms for importing sqrt from numpy with the same command. The difference is negligible, honestly.
However, if you don't need a certain aspect of a module, importing it is not encouraged.
In this particular case, since there is no discernable performance benefit and because there are few situations in which you would import just numpy.sqrt (math.sqrt is ~4x faster. The extra features numpy.sqrt offers would only be useable if you have numpy data, which would require you to import the entire module, of course).
There might be a rare scenario in which you don't need all of numpy but still need numpy.sqrt, e.g. using pandas.DataFrame.to_numpy() and manipulating the data in some ways, but honestly I don't feel the 20ms of speed is worth anything in the real world. Especially since you saw worse performance with importing just numpy.sqrt.
Related
What are some reasons that can cause the vscode debugger to be slow? Any speculations or link to vscode limitation would be great.
I'm currently creating sliding windows with numpy array using numpy.lib.stride_tricks.sliding_window_view
And for some reason in a specific section of the code the debugger just fails to compute large sliding windows.
len(df) > 10*24*60
win_size = 240
sliding_win = np.lib.stride_tricks.sliding_window_view(df.values,(win_size,len(df.columns)))
what is weird is that when i ran it without the debugger, it is computed really fast. If I run the below code (basically the above extracted to a stub file and with a even larger dataframe and window size), the debugger can run that with no issue as well.
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view
from time import time
...
# some random code to increase memory usage.
...
p = np.zeros((10*365*24*60,7))
t = time()
q = sliding_window_view(p,(24*60,7))
print(time()-t)
end = 1 # break point
from guppy import hpy
h = hpy()
print(h.heap())
When checked using guppy, the original code used about 100 MB and the stub file used more. So it doesnt seem to be a memory issue.
I have a a loop in which I'm calculating several pseudoinverses of rather large, non-sparse matrices (eg. 20000x800).
As my code spends most time on the pinv, I was trying to find a way to speed up the computation. I'm already using multiprocessing (joblib/loky) to run with several processes, but that of course increases also overhead. Using jit did not help much.
Is there a faster way / better implementation to compute pseudoinverse using any function? Precision isn't key.
My current benchmark
import time
import numba
import numpy as np
from numpy.linalg import pinv as np_pinv
from scipy.linalg import pinv as scipy_pinv
from scipy.linalg import pinv2 as scipy_pinv2
#numba.njit
def np_jit_pinv(A):
return np_pinv(A)
matrix = np.random.rand(20000, 800)
for pinv in [np_pinv, scipy_pinv, scipy_pinv2, np_jit_pinv]:
start = time.time()
pinv(matrix)
print(f'{pinv.__module__ +"."+pinv.__name__} took {time.time()-start:.3f}')
numpy.linalg.pinv took 2.774
scipy.linalg.basic.pinv took 1.906
scipy.linalg.basic.pinv2 took 1.682
__main__.np_jit_pinv took 2.446
EDIT:
JAX seems to be 30% faster! impressive! Thanks for letting me know #yuri-brigance . For Windows it works well under WSL.
numpy.linalg.pinv took 2.774
scipy.linalg.basic.pinv took 1.906
scipy.linalg.basic.pinv2 took 1.682
__main__.np_jit_pinv took 2.446
jax._src.numpy.linalg.pinv took 0.995
Try with JAX:
import jax.numpy as jnp
jnp.linalg.pinv(A)
Seems to be slightly faster than regular numpy.linalg.pinv. On my machine your benchmark looks like this:
jax._src.numpy.linalg.pinv took 3.127
numpy.linalg.pinv took 4.284
I'm trying to compare Pyfftw (in Python 3.6) with matlab r2017a fft.
import time
import numpy
import pyfftw
import multiprocessing
nthread = multiprocessing.cpu_count()
print(nthread)
n=2**20
a = pyfftw.empty_aligned(n, dtype='complex128')
print("fft_object = pyfftw.builders.fft(a)")
fft_object = pyfftw.builders.fft(a) #this instruction spend much time
print("generate numbers")
a[:]= 5*numpy.random.rand(n)
print(a)
print("start fft")
start = time.clock()
y=fft_object()
end4 = time.clock() - start
print(end, time:")
print(end4)
print("result")
print(y)
print(len(y))
while if i use matlab:
x=5*rand(2^20,1);tic;fft(x);toc
this request just the time for computation of fft algorithm, that is the approximatively the same time of the python's call on fft_object().
Thanks in advance for your kind support.
You might take a look at GPU-based codes (if you have the proper hardware):
http://pypi.python.org/pypi/pyfft
http://pypi.python.org/pypi/scikits.cuda
They are based on PyCuda und PyOpenCL. I don't have much experience with these so you have to do a little digging to find what suits you best.
This is one of the standard example code we find every where...
import time
import numpy
import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
import pycuda.autoinit
size = 1e7
t0 = time.time()
x = numpy.linspace(1, size, size).astype(numpy.float32)
y = numpy.sin(x)
t1 = time.time()
cpuTime = t1-t0
print(cpuTime)
t0 = time.time()
x_gpu = gpuarray.to_gpu(x)
y_gpu = cumath.sin(x_gpu)
y = y_gpu.get()
t1 = time.time()
gpuTime = t1-t0
print(gpuTime)
the results are: 200 msec for cpu and 2.45 sec for GPU... more then 10X
I'm running on win 10... vs 2015 with PTVS...
Best regards...
Steph
It looks like pycuda introduces some additional overhead the first time you call the cumath.sin() function (~400ms on my system). I suspect this is due to the need to compile CUDA code for the function being called. More importantly, this overhead is independent of the size of the array being passed to the function. Additional calls to cumath.sin() are much faster, with CUDA code already compiled for use. On my system, the gpu code given in the question runs in about 20ms (for repeated runs), compared to roughly 130ms for the numpy code.
I don't profess to know much at all about the inner workings of pycuda, so would be interested to hear other people's opinions on this.
I'm fairly new to programming, but this problem happens in python and in excel as well.
I'm using the following formulas for the RC transfer function
s/(s+1) for High Pass
1/(s+1) for Low Pass
with s = jwRC
below is the code I used in python
from pylab import *
from numpy import *
from cmath import *
"""
Generating a transfer function for RC filters.
Importing modules for complex math and plotting.
"""
f = arange(1, 5000, 1)
w = 2.0j*pi*f
R=100
C=1E-5
hp_tf = (w*R*C)/(w*R*C+1) # High Pass Transfer function
lp_tf = 1/(w*R*C+1) # Low Pass Transfer function
plot(f, hp_tf) # plot high pass transfer function
plot(f, lp_tf, '-r') # plot low pass transfer function
xscale('log')
I can't post images yet so I can't show the plot. But the issue here is the cutoff frequency is different for each one. They should cross at y=0.707, but they actually cross at about 0.5.
I figure my formula or method is wrong somewhere, but I can't find the mistake can anyone help me out?
Also, on a related note, I tried to convert to dB scale and I get the following error:
TypeError: only length-1 arrays can be converted to Python scalars
I'm using the following
debl=20*log(hp_tf)
This is a classical example why you should avoid pylab and more generally imports of the form
from module import *
unless you know exactly what it does, since it hopelessly clutters the name space.
Using,
import matplotlib.pyplot as plt
import numpy as np
and then calling np.log and plt.plot etc. will solve your problem.
Furether explanations
What's happening here is that,
from pylab import *
defines a log function from numpy that operate on arrays (the one you want).
However, the later import,
from cmath import *
overwrites it with a version that only accepts scalars, hence your error.