let us say I have a numpy matrix A that is of size Nx2. What I am doing, is computing the 4-quadrant inverse tangent of the first column, and the second column, as so:
import math
for i in xrange(A.shape[0]):
phase[i] = math.atan2(A[i,0], A[i,1])
I would however like to do this in a vectorized manner. How can I do that? The math.atan2() function does not seem to support vectorization.
Thanks!
It looks to me like it should just be:
import numpy as np
phase = np.arctan2(A[:, 0], A[:, 1])
Or possibly (if phase is a different length than A for some odd reason):
phase[:len(A)] = np.arctan2(A[:, 0], A[:, 1])
In other words, don't use math.atan2, use numpy.arctan2 since numpy functions are generally vectorized versions of their math counterparts.
Related
I know that using Python's random.choices I can do this:
import random
array_probabilities = [0.5 for _ in range(4)]
print(array_probabilities) # [0.5, 0.5, 0.5, 0.5]
a = [random.choices([0, 1], weights=[1 - probability, probability])[0] for probability in array_probabilities]
print(a) # [1, 1, 1, 0]
How to make an numpy array of 0 and 1 based on a probability array?
Using random.choices is fast, but I know numpy is even faster. I would like to know how to write the same code but using numpy. I'm just getting started with numpy and would appreciate your feedback.
One option:
out = (np.random.random(size=len(array_probabilities)) > array_probabilities).astype(int)
Example output:
array([0, 1, 0, 1])
Your question got me wondering so I wrote a basic function to compare their timings. And it seems you are right! Timings change but only a little. Here you can see the code below and the output.
import numpy as np
import time
import random
def stack_question():
start=time.time()*1000
array_probabilities = [0.5 for _ in range(4)]
a = [random.choices([0, 1], weights=[1 - probability, probability])[0] for probability in array_probabilities]
end=time.time()
return (start-end)
def numpy_random_array():
start_time=time.time()*1000
val=np.random.rand(4,1)
end_time=time.time()
return (start_time-end_time)
print("List implementation ",stack_question())
print("Array implementation ",numpy_random_array())
The output:
List implementation 1665476650232.8433
Array implementation 1665476650233.9226
Edit: From geeks4geeks I found the following explanation of why it is faster to use numpy arrays.
NumPy Arrays are faster than Python Lists because of the following reasons:
An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
The NumPy package breaks down a task into multiple fragments and then processes all the fragments parallelly.
The NumPy package integrates C, C++, and Fortran codes in Python. These programming languages have very little execution time compared to Python.
probabilities = np.random.rand(1,10)
bools_arr = np.apply_along_axis(lambda x: 1 if x > 0.5 else 0, 1, [probabilities])
I have an array of values a = (2,3,0,0,4,3)
y=0
for x in a:
y = (y+x)*.95
Is there any way to use cumsum in numpy and apply the .95 decay to each row before adding the next value?
You're asking for a simple IIR Filter. Scipy's lfilter() is made for that:
import numpy as np
from scipy.signal import lfilter
data = np.array([2, 3, 0, 0, 4, 3], dtype=float) # lfilter wants floats
# Conventional approach:
result_conv = []
last_value = 0
for elmt in data:
last_value = (last_value + elmt)*.95
result_conv.append(last_value)
# IIR Filter:
result_IIR = lfilter([.95], [1, -.95], data)
if np.allclose(result_IIR, result_conv, 1e-12):
print("Values are equal.")
If you're only dealing with a 1D array, then short of scipy conveniences or writing a custom reduce ufunc for numpy, then in Python 3.3+, you can use itertools.accumulate, eg:
from itertools import accumulate
a = (2,3,0,0,4,3)
y = list(accumulate(a, lambda x,y: (x+y)*0.95))
# [2, 4.75, 4.5125, 4.286875, 7.87253125, 10.3289046875]
Numba provides an easy way to vectorize a function, creating a universal function (thus providing ufunc.accumulate):
import numpy
from numba import vectorize, float64
#vectorize([float64(float64, float64)])
def f(x, y):
return 0.95 * (x + y)
>>> a = numpy.array([2, 3, 0, 0, 4, 3])
>>> f.accumulate(a)
array([ 2. , 4.75 , 4.5125 , 4.286875 ,
7.87253125, 10.32890469])
I don't think that this can be done easily in NumPy alone, without using a loop.
One array-based idea would be to calculate the matrix M_ij = .95**i * a[N-j] (where N is the number of elements in a). The numbers that you are looking for are found by summing entries diagonally (with i-j constant). You could use thus use multiple numpy.diagonal(…).sum().
The good old algorithm that you outline is clearer and probably quite fast already (otherwise you can use Cython).
Doing what you want through NumPy without a single loop sounds like wizardry to me. Hats off to anybody who can pull this off.
In Pythons Numpy module, is there a function that can calculate long/advanced math expressions on an array? I heard of the numexp module but want to stay clear of further dependencies.
Better yet, can I limit these expressions to only say the first or second element of the sub arrays within my array, without having to unpack them as separate arrays?
Here is my specific problem. I have an array of arrays containing geographic point coordinates looking like this: [[x1,y1],[x2,y2],[x3,y3],etc...]. What I want is to transform these geocoords to pixel coordinates so they can be drawn on an image. I therefore want to run the following expression/calculation on the first element of each subarray, ie the xs:
((180+X)/360)*screenwidthpixels
And on the second element, ie the ys:
((-90+Y)/180)*-screenheightpixels
These expressions would work in a python for-loop but is too slow, which is why I'm turning to Numpy. I know I can and have tried to just link numpys single math operator functions after each other but still too slow, and besides, to do that I first had to unpack all the xs and ys to separate arrays and repack them together after the calculation making it even slower.
So I guess I'm looking for a more direct Numpy way using less steps to transform my coordinate array using the expressions above. Any ideas?
import numpy as np
points = np.random.rand(10,2)
translation = np.array([180,-90])
scaling = np.array([1024, -768]) / np.array([360,180])
transformed_points = (points + translation) * scaling
This will do what you are looking for. It relies on numpy broadcasting rules to achieve expressiveness and performance.
But rather than explaining exactly how that works, I think you are better off finding yourself a good numpy primer, and starting at the top. numpy is one of the best things about python, and you cant go wrong learning a little more about it. Suffice to say, numpy is certainly up to the kind of task you are facing.
I'm a little confused because I'm not sure exactly what you're saying you already tried, or what the speed condition for success is.
Are you saying you already tried something like the following, but it is too slow?
arr = whatever
arr[:,0] = (arr[:,0] + 180) / (360 * screenwidthpixels)
arr[:,1] = 180 - (arr[:,1] - 90) / (180 * screenheightpixels)
I'm not sure what you mean by "having to unpack" to X and Y. Here's how you avoid unpacking (if i understand...)
arr = np.array([ [x1,y1], [x2,y2], [x3,y3] ])
arr.shape
=> (3, 2)
X = arr[:,0] # fast, creates a view
Y = arr[:,1] # fast too
((X+180)/360)/screenwidthpixels
Further speed up can be achieved by rewriting/simplifying your expressions.
((X+180)/360)/s => (X+180)/(360*s)
(180-((Y+90)/180))/s => (180/s-1/(2*s)) - y/(180*s)
In the first rewrite, you get 2 traverses of the array, instead of 3, and in the second, the array is only traversed twice, instead of 4 times.
In [235]: xs=arange(1000)
In [236]: ys=arange(1, 1001)
In [237]: a=array([xs, ys]).T
In [238]: a
Out[238]:
array([[ 0, 1],
[ 1, 2],
[ 2, 3],
...,
[ 997, 998],
[ 998, 999],
[ 999, 1000]])
In [240]: a[:, 0]=(a[:, 0]+180)/360/1024
the a[:, 0] offers a view of the first column of a, it's fast and memory saving. docs for numpy here
I want to create an array in numpy that contains the values of a mathematical series, in this example the square of the previous value, giving a single starting value, i.e. a_0 = 2, a_1 = 4, a_3 = 16, ...
Trying to use the vectorization in numpy I thought this might work:
import numpy as np
a = np.array([2,0,0,0,0])
a[1:] = a[0:-1]**2
but the outcome is
array([2, 4, 0, 0, 0])
I have learned now that numpy does internally create a temporary array for the output and in the end copies this array, that is why it fails for the values that are zero in the original array.
Is there a way to vectorize this function using numpy, numexpr or other tools? What other ways are there to effectively calculate the values of a series when fast numpy functions are available without going for a for loop?
There is no general way to vectorise recursive sequence definitions in NumPy. This particular case is rather easy to write without a for-loop though:
>>> 2 ** 2 ** numpy.arange(5)
array([ 2, 4, 16, 256, 65536])
I'm looking for dynamically growing vectors in Python, since I don't know their length in advance. In addition, I would like to calculate distances between these sparse vectors, preferably using the distance functions in scipy.spatial.distance (although any other suggestions are welcome). Any ideas how to do this? (Initially, it doesn't need to be efficient.)
Thanks a lot in advance!
You can use regular python lists (which are dynamic) as vectors. Trivial example follows.
from scipy.spatial.distance import sqeuclidean
a = [1,2,3]
b = [0,0,0]
print sqeuclidean(a,b) # 14
As per aganders3's suggestion, do note that you can also use numpy arrays if needed:
import numpy
a = numpy.array([1,2,3])
If the sparse part of your question is crucial I'd use scipy for that - it has support for sparse matrixes. You can define a 1xn matrix and use it as a vector. This works (the parameter is the size of the matrix, filled with zeroes by default):
sqeuclidean(scipy.sparse.coo_matrix((1,3)),scipy.sparse.coo_matrix((1,3))) # 0
There are many kinds of sparse matrixes, some dictionary based (see comment). You can define a row sparse matrix from a list like this:
scipy.sparse.csr_matrix([1,2,3])
Here is how you can do it in numpy:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([0, 0, 0])
c = np.sum(((a - b) ** 2)) # 14