Can Python perform vectorized operations?

Can Python perform vectorized operations? - python

I want to implement the following Matlab code in Python:
x=1:100;
y=20*log10(x);
I tried using Numpy to do this:
y = numpy.zeros(x.shape)
for i in range(len(x)):
y[i] = 20*math.log10(x[i])
But this uses a for loop; is there anyway to do a vectorized operation like in Matlab? I know for some simple math such as division and multiplication, it's possible. But what about other more sophisticated operations like logarithm here?

y = numpy.log10(numpy.arange(1, 101)) * 20
In [30]: numpy.arange(1, 10)
Out[30]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [31]: numpy.log10(numpy.arange(1, 10))
Out[31]:
array([ 0. , 0.30103 , 0.47712125, 0.60205999, 0.69897 ,
0.77815125, 0.84509804, 0.90308999, 0.95424251])
In [32]: numpy.log10(numpy.arange(1, 10)) * 20
Out[32]:
array([ 0. , 6.02059991, 9.54242509, 12.04119983,
13.97940009, 15.56302501, 16.9019608 , 18.06179974, 19.08485019])

Yep, there certainly is.
x = numpy.arange(1, 100)
y = 20 * numpy.log10(x)

Numpy has a lot of built-in array operators like log10. If it's not listed in numpy's documentation and you can't generate it from combining built-in methods, then there's no easy way to do it efficiently. You can implement a C-level function to work on numpy arrays and compile that, but this is a lot more work than one or two lines of Python code.
For your case you almost have the right output already:
y = 20*numpy.log10(x)

You may want to take a look at the Numpy documentation. This is a good place to start:
http://docs.scipy.org/doc/numpy/reference/routines.html
And specifically related to your question:
http://docs.scipy.org/doc/numpy/reference/routines.math.html

If you're not trying to do anything complicated, the original code could be implemented this way as well, without requiring the use of numpy, if I'm not mistaken.
>>> import math
>>> x = range(1, 101)
>>> y = [ 20 * math.log10(z) for z in x ]

Apart from performing vectorized operation using numpy standard vectorized functions, you can also make your custom vectorized function using numpy.vectorize. Here is one example:
>>> def myfunc(a, b):
... "Return a-b if a>b, otherwise return a+b"
... if a > b:
... return a - b
... else:
... return a + b
>>>
>>> vfunc = np.vectorize(myfunc)
>>> vfunc([1, 2, 3, 4], 2)
array([3, 4, 1, 2])
As mentioned in documentation, unlike numpy's standard vectorized functions, this won't improve the performance

Related

Faster alternative to using the `map` function

I have a function f, for exapmle:
def f(x):
return x**2
and want to obtain an array consisting of f evaluated over an interval, for example the unit interval (0,1). We ca do this as follows:
import numpy as np
X = np.arange(0,1,0.01)
arr = np.array(list(map(f, X)))
However, this last line is very time consuming when the function is complicated (in my case it involves some integrals). Is there a way to do this faster? I am happy to have a non-elegant solution - the focus is on speed.

You could use list comprehension to slightly decrease runtime.
arr = [f(x) for x in range(0, 5)] # range is the interval
This should work. It will only slightly decrease runtime though. You shouldn't be worried about runtime unless you use very large numbers with map().

If f is so complicated that it can't be expressed in terms of compiled array operations, and can only take scalars, I have found that frompyfunc gives the best performance (about 2x compared to an explicit loop)
In [76]: def f(x):
...: return x**2
...:
In [77]: foo = np.frompyfunc(f,1,1)
In [78]: foo(np.arange(4))
Out[78]: array([0, 1, 4, 9], dtype=object)
In [79]: foo(np.arange(4)).astype(int)
Out[79]: array([0, 1, 4, 9])
It returns dtype object, so needs an astype. np.vectorize uses this as well, but is a bit slower. Both generalize to various shapes of input array(s).
For a 1d result fromiter works with map (without the list) part:
In [84]: np.fromiter((f(x) for x in range(4)),int)
Out[84]: array([0, 1, 4, 9])
In [86]: np.fromiter(map(f, range(4)),int)
Out[86]: array([0, 1, 4, 9])
You'll have to do your own timings in a realistic case.

Use operations that operate on entire arrays. For example, with a function that just squares the input (slightly corrected from your example):
def f(x):
return x**2
then you'd just do
arr = f(X)
because NumPy defines operators like ** to operate on entire arrays at once.
Your real function might not be quite as straightforward. You say there are integrals involved; to make whole-array operations work with that, you might have to pass arguments differently or change what you're using to compute the integrals. In general, though, whole-array operations will vastly outperform anything that needs to call Python-level code in a loop.

You could try numpy.vectorize. It's very good way to apply function to list or array
import numpy as np
def foo(x):
return x**2
foo = np.vectorize(foo)
arr = np.arange(10)
In [1]: foo(arr)
Out[1]: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])

Comparing values in two numpy arrays with 'if'

Im fairly new to numpy arrays and have encountered a problem when comparing one array with another.
I have two arrays, such that:
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
I want to do something like the following:
if b > a:
c = b
else:
c = a
so that I end up with an array c = np.array([2,4,3,5,5]).
This can be otherwise thought of as taking the max value for each element of the two arrays.
However, I am running into the error
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all().
I have tried using these but Im not sure that the are right for what I want.
Is someone able to offer some advice in solving this?

You are looking for the function np.fmax. It takes the element-wise maximum of the two arrays, ignoring NaNs.
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([2, 4, 3, 5, 2])
c = np.fmax(a, b)
The output is
array([2, 4, 3, 5, 5])

As with almost everything else in numpy, comparisons are done element-wise, returning a whole array:
>>> b > a
array([ True, True, False, True, False], dtype=bool)
So, is that true or false? What should an if statement do with it?
Numpy's answer is that it shouldn't try to guess, it should just raise an exception.
If you want to consider it true because at least one value is true, use any:
>>> if np.any(b > a): print('Yes!')
Yes!
If you want to consider it false because not all values are true, use all:
>>> if np.all(b > a): print('Yes!')
But I'm pretty sure you don't want either of these. You want to broadcast the whole if/else over the array.
You could of course wrap the if/else logic for a single value in a function, then explicitly vectorize it and call it:
>>> def mymax(a, b):
... if b > a:
... return b
... else:
... return a
>>> vmymax = np.vectorize(mymax)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
This is worth knowing how to do… but very rarely worth doing. There's usually a more indirect way to do it using natively-vectorized functions—and often a more direct way, too.
One way to do it indirectly is by using the fact that True and False are numerical 1 and 0:
>>> (b>a)*b + (b<=a)*a
array([2, 4, 3, 5, 5])
This will add the 1*b[i] + 0*a[i] when b>a, and 0*b[i] + 1*a[i] when b<=a. A bit ugly, but not too hard to understand. There are clearer, but more verbose, ways to write this.
But let's look for an even better, direct solution.
First, notice that your mymax function will do exactly the same as Python's built-in max, for 2 values:
>>> vmymax = np.vectorize(max)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
Then consider that for something so useful, numpy probably already has it. And a quick search will turn up maximum:
>>> np.maximum(a, b)
array([2, 4, 3, 5, 5])

Here's an other way of achieving this
c = np.array([y if y>z else z for y,z in zip(a,b)])

The following methods also work:
Use numpy.maximum
>>> np.maximum(a, b)
Use numpy.max and numpy.vstack
>>> np.max(np.vstack(a, b), axis = 0)

May not be the most efficient one but this is a more suitable answer to the original question:
import numpy as np
c = np.zeros(shape=(5,1))
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
for i in range(5):
if b.item(i) > a.item(i):
c[i] = b.item(i)
else:
c[i] = a.item(i)

How to do dot/cross multiplication of Vectors with Sympy

I would like to know how to do
dot multiplication
cross multiplication
add/sub
of vectors with the sympy library. I have tried looking into the official documentation but I have had no luck or It was too complicated. Can anyone help me out on this?
I was trying to do this simple operation
a · b = |a| × |b| × cos(θ)

To do vector dot/cross product multiplication with sympy, you have to import the basis vector object CoordSys3D. Here is a working code example below:
from sympy.vector import CoordSys3D
N = CoordSys3D('N')
v1 = 2*N.i+3*N.j-N.k
v2 = N.i-4*N.j+N.k
v1.dot(v2)
v1.cross(v2)
#Alternately, can also do
v1 & v2
v1 ^ v2
Please note the last 2 lines are not recommended by sympy documentation. It is better to use the methods explicitly. Personally I think this is a matter of preference, however.

You can do it as described here: https://docs.sympy.org/latest/modules/matrices/matrices.html?highlight=cross#sympy.matrices.matrices.MatrixBase.cross
For example:
>>> from sympy import Matrix
>>> M = Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> v = Matrix([1, 1, 1])
>>> M.row(0).dot(v)
6
>>> M.col(0).dot(v)
12
>>> v = [3, 2, 1]
>>> M.row(0).dot(v)
10

http://docs.sympy.org/0.7.2/modules/physics/mechanics/api/functions.html
There are examples on that doc and some code here. What exactly don't you understand? Maybe, try to be more specific.
The dot multiplication as you write it is explained in the doc with that example:
from sympy.physics.mechanics import ReferenceFrame, Vector, dot
from sympy import symbols
q1 = symbols('q1')
N = ReferenceFrame('N') # so, ||x|| = ||y|| = ||z|| = 1
dot(N.x, N.x)
1 # it is ||N.x||*||N.y||*cos(Nx,Ny)
dot(N.x, N.y)
0 # it is ||N.x||*||N.y||*cos(Nx,Ny)
A = N.orientnew('A', 'Axis', [q1, N.x])
dot(N.y, A.y)
cos(q1)
Also, you might consider doing it with numpy...

numpy is designed for this, it is a clean and fast way to do numerical calculations because it's implemented in C.
In [36]: x = [1, 2, 3]
...: y = [4, 5, 6]
In [37]: import numpy as np
...: print np.dot(x, y)
...: print np.cross(x, y)
...: print np.add(x, y) #np.subtract, etc.
32
[-3 6 -3]
[5 7 9]
There is a discussion on numpy and sympy on google groups.

If you have symbolic vectors and need to use sympy it is actually very simple, just use the cross function as exemplified below:
import sympy as s
a,b,c,x,y,z = s.symbols("a,b,c,x,y,z")
v1 = s.Matrix([a,b,c])
v2 = s.Matrix([x,y,z])
cross_result = v1.cross(v2)
print(cross_result)
With output:
Matrix([
[ b*z - c*y],
[-a*z + c*x],
[ a*y - b*x]])

I had a similar issue, and the method followed was just to put the sympy equations/variables within np.array() and use np.cross().
for example,
np.cross(np.array(M),np.array(N))
where M & N are variables computed using sympy

how to speed up enumerate for numpy array / how to enumerate over numpy array efficiently?

I need to generate a lot of random numbers. I've tried using random.random but this function is quite slow. Therefore I switched to numpy.random.random which is way faster! So far so good. The generated random numbers are actually used to calculate some thing (based on the number). I therefore enumerate over each number and replace the value. This seems to kill all my previously gained speedup. Here are the stats generated with timeit():
test_random - no enumerate
0.133111953735
test_np_random - no enumerate
0.0177130699158
test_random - enumerate
0.269361019135
test_np_random - enumerate
1.22525310516
as you can see, generating the number is almost 10 times faster using numpy, but enumerating over those numbers gives me equal run times.
Below is the code that I'm using:
import numpy as np
import timeit
import random
NBR_TIMES = 10
NBR_ELEMENTS = 100000
def test_random(do_enumerate=False):
y = [random.random() for i in range(NBR_ELEMENTS)]
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
def test_np_random(do_enumerate=False):
y = np.random.random(NBR_ELEMENTS)
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
if __name__ == '__main__':
from timeit import Timer
t = Timer("test_random()", "from __main__ import test_random")
print "test_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random()", "from __main__ import test_np_random")
print "test_np_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_random(True)", "from __main__ import test_random")
print "test_random - enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random(True)", "from __main__ import test_np_random")
print "test_np_random - enumerate"
print t.timeit(NBR_TIMES)
What's the best way to speed this up and why does enumerate slow things down so dramatically?
EDIT: the reason I use enumerate is because I need both the index and the value of the current element.

To take full advantage of numpy's speed, you want to create ufuncs whenever possible. Applying vectorize to a function as mgibsonbr suggests is one way to do that, but a better way, if possible, is simply to construct a function that takes advantage of numpy's built-in ufuncs. So something like this:
>>> import numpy
>>> a = numpy.random.random(10)
>>> a + 1
array([ 1.29738145, 1.33004628, 1.45825441, 1.46171177, 1.56863326,
1.58502855, 1.06693054, 1.93304272, 1.66056379, 1.91418473])
>>> (a + 1) * 0.25 / 4
array([ 0.08108634, 0.08312789, 0.0911409 , 0.09135699, 0.09803958,
0.09906428, 0.06668316, 0.12081517, 0.10378524, 0.11963655])
What is the nature of the function you want to apply across the numpy array? If you tell us, perhaps we can help you come up with a version that uses only numpy ufuncs.
It's also possible to generate an array of indices without using enumerate. Numpy provides ndenumerate, which is an iterator, and probably slower, but it also provides indices, which is a very quick way to generate the indices corresponding to the values in an array. So...
>>> numpy.indices(a.shape)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
So to be more explicit, you can use the above and combine them using numpy.rec.fromarrays:
>>> a = numpy.random.random(10)
>>> ind = numpy.indices(a.shape)
>>> numpy.rec.fromarrays([ind[0], a])
rec.array([(0, 0.092473494150913438), (1, 0.20853257641948986),
(2, 0.35141455604686067), (3, 0.12212258656960817),
(4, 0.50986868372639049), (5, 0.0011439325711705139),
(6, 0.50412473457942508), (7, 0.28973489788728601),
(8, 0.20078799423168536), (9, 0.34527678271856999)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
It's starting to sound like your main concern is performing the operation in-place. That's harder to do using vectorize but it's easy with the ufunc approach:
>>> def somefunc(a):
... a += 1
... a /= 15
...
>>> a = numpy.random.random(10)
>>> b = a
>>> somefunc(a)
>>> a
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
>>> b
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
As you can see, numpy performs these operations in-place.

Check numpy.vectorize, it should let you apply arbitrary functions to numpy arrays. For your simple example, you'd do something like this:
vecFunc = vectorize(lambda x: x + 1)
vecFunc(y)
However, that will create a new numpy array instead of modifying it in-place (which may or may not be a problem in your particular case).
In general, you'll always be better manipulating numpy structures with numpy functions than iterating with python functions, since the former are not only optimized but implemented in C, while the latter will be always interpreted.

Populate numpy matrix from the difference of two vectors

Is it possible to construct a numpy matrix from a function? In this case specifically the function is the absolute difference of two vectors: S[i,j] = abs(A[i] - B[j]). A minimal working example that uses regular python:
import numpy as np
A = np.array([1,3,6])
B = np.array([2,4,6])
S = np.zeros((3,3))
for i,x in enumerate(A):
for j,y in enumerate(B):
S[i,j] = abs(x-y)
Giving:
[[ 1. 3. 5.]
[ 1. 1. 3.]
[ 4. 2. 0.]]
It would be nice to have a construction that looks something like:
def build_matrix(shape, input_function, *args)
where I can pass an input function with it's arguments and retain the speed advantage of numpy.

In addition to what #JoshAdel has suggested, you can also use the outer method of any numpy ufunc to do the broadcasting in the case of two arrays.
In this case, you just want np.subtract.outer(A, B) (Or, rather, the absolute value of it).
While either one is fairly readable for this example, in some cases broadcasting is more useful, while in others using ufunc methods is cleaner.
Either way, it's useful to know both tricks.
E.g.
import numpy as np
A = np.array([1,3,6])
B = np.array([2,4,6])
diff = np.subtract.outer(A, B)
result = np.abs(diff)
Basically, you can use outer, accumulate, reduce, and reduceat with any numpy ufunc such as subtract, multiply, divide, or even things like logical_and, etc.
For example, np.cumsum is equivalent to np.add.accumulate. This means you could implement something like a cumdiv by np.divide.accumulate if you even needed to.

I recommend taking a look into numpy's broadcasting capabilities:
In [6]: np.abs(A[:,np.newaxis] - B)
Out[6]:
array([[1, 3, 5],
[1, 1, 3],
[4, 2, 0]])
http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
Then you could simply write your function as:
In [7]: def build_matrix(func,args):
...: return func(*args)
...:
In [8]: def f1(A,B):
...: return np.abs(A[:,np.newaxis] - B)
...:
In [9]: build_matrix(f1,(A,B))
Out[9]:
array([[1, 3, 5],
[1, 1, 3],
[4, 2, 0]])
This should also be considerably faster than your solution for larger arrays.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can Python perform vectorized operations? - python

Yep, there certainly is. x = numpy.arange(1, 100) y = 20 * numpy.log10(x)

You may want to take a look at the Numpy documentation. This is a good place to start: http://docs.scipy.org/doc/numpy/reference/routines.html And specifically related to your question: http://docs.scipy.org/doc/numpy/reference/routines.math.html

If you're not trying to do anything complicated, the original code could be implemented this way as well, without requiring the use of numpy, if I'm not mistaken. >>> import math >>> x = range(1, 101) >>> y = [ 20 * math.log10(z) for z in x ]

Related

Faster alternative to using the `map` function

Comparing values in two numpy arrays with 'if'

How to do dot/cross multiplication of Vectors with Sympy

how to speed up enumerate for numpy array / how to enumerate over numpy array efficiently?

Populate numpy matrix from the difference of two vectors

Categories

Resources