Given the two matrices f and x:
def f11(x): return 1
def f12(x): return x+1
def f21(x): return np.log(x)
def f22(x): return np.exp(x)
f = np.matrix([[f11,f12],[f21,f22]])
x = np.matrix([[10,5],[3,8]])
How can I apply element-wise the matrix operator f to x (considering that the functions may be more complex, so it's just an example)?
Matrices are basically not designed to support such functionalities. Instead you can use one function that accepts an array and returns the expected result. The reason that you should use array instead of matrix is that they're more flexible and better adoptable with python operations, like in this case in-place unpacking.
In [41]: def apply_f(matrix):
...: ((x, y), (z, t)) = matrix
...: return np.array([[1, y +1], [np.log(z), np.exp(t)]])
...:
In [42]: x = np.array([[3, 5], [10, 8]])
In [43]: apply_f(x)
Out[43]:
array([[1.00000000e+00, 6.00000000e+00],
[2.30258509e+00, 2.98095799e+03]])
Related
I am trying to get good at numpy and want to know if I can use values in exisiting arrays to serve as indices for a function that returns values for another array. I can do this:
def somefun(i):
return i+1
x = np.array([2, 4, 5])
k_labs = np.arange(100)
k_labs2 = k_labs[somefun(x[:])]
But how do I deal with using vectors in matrices in case x was a double array, where I just want to use one vector at a time as indices-arguments for a function, such as X[:, i], without using for-loops?
such as would be the case in:
x = np.array([[2, 4, 5],[7, 8, 9]])
def somefun(i):
return i+1
k_labs = np.arange(100)
k_labs2 = k_labs[somefun(x[:, i])]
EDIT ITERATION 2
To get the gist of what I am trying to accomplish see the code below. In the function pred as you can see i wanted to write the things I've commented out in a numpy fashion that might work better yet. I have some probelms though we the two lines I put in instead, since I get an error of wrong broadcast dimensions in the function called distance, at the the line where I try to assign the normalized vectors at a variable.
class kNN:
def __init__(self, X_train : np.array, label_train, val = None):
self.X = X_train#X[:-1, :]
self.labels = label_train#X[-1, :]
#self.k = k
self.kNN_4all = None #np.zeros(self.X.shape[1])
def distance(self, x1):
x1 = np.tile(x1, (self.X.shape[1], 1)) #creates a matrix of len of X with copyes of x1 vector for easy matrix subtraction.
dists = np.linalg.norm(x1 - self.X.T, axis = 1) #Flips to find linalg.norm for all the axis
return dists
def k_nearest(self, x_vec, k):
k_nearest = self.distance(x_vec)
k_nearest = np.argsort(k_nearest)[ :k]
kNN_labs = np.zeros(k_nearest.shape)
kNN_labs[:] = self.labels[k_nearest[:]]
unique, vote = np.unique(kNN_labs, return_counts=True)
return unique[np.argmax(vote)]
def pred(self, X_test, k):
self.kNN_4all = np.zeros(X_test.shape[1])
self.kNN_4all = self.k_nearest(X_test[:, :], k)
#for i in range(X_test.shape[1]):
# NewLabel = self.k_nearest(X_test[:, i], k) #defines x_vec in matrix X
# self.kNN_4all[i] = NewLabel
#return self.kNN_4all
def prec(self, labels_val):
elem_equal = (self.kNN_4all == labels_val).astype(int).flatten()
prec = np.sum(elem_equal)/elem_equal.shape
return 1 - prec[0]
X_train = X[:, :100]
labs_train = labs[:100]
pilot = kNN(X_train, labs_train)
pilot.pred(X[:,100:200], 10)
pilot.prec(labs[100:200])
I get the following error:
ValueError: operands could not be broadcast together with shapes (78400,100) (100,784)
As we can see from the code the k_nearest(self, x_vec, k) takes one 1D-subarray, so passing any full matrix X will cause the broad-casting error, since the functions within k_nearest relies on passing only a 1D subarray.
I don't know if it really is possible to avoid for loops in this regard and use numpy to increment through 1D subarrays as arguments for a function, such that each call of the function with the arguments can be assigned to a different cell in another array, in this case the self.kNN_4all
x = np.array([[2, 4, 5], [7, 8, 9], [33, 50, 71]])
x = x + 1
k_labs = np.arange(100)
ttt = k_labs[x]
print(ttt)
ttt creates an array that takes values from 'k_labs' based on pseudo-indexes 'x'. The array is accessed for example:
print(ttt[1])#[ 8 9 10]
If you want to refer to a certain value (for example, with indexes x[2]) alone, then the code will be as follows:
x = np.array([[2, 4, 5], [7, 8, 9], [33, 50, 71]])
x = x + 1
k_labs = np.arange(100)
print(k_labs[x[2]])
I am using a custom metric function with scipy's cdist function.
The custom function is something like
def cust_metric(u,v):
dist = np.cumsum(np.gcd(u,v) * k)
return dist
where k is an arbitrary coefficient.
Ideally, I was hoping to pass k as an argument when calling cdist like so:
d_ar = scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=7))
However, this throws an error.
I was wondering if there is a simple solution that I may be missing?
A quick but non-elegant fix is to declare k as a global variable and adjust it when needed.
According to its documentation, the value for metric should be a callable (or a string for a particular fixed collection). In your case you could obtain that through
def cust_metric(k):
return lambda u, v: np.cumsum(np.gcd(u, v) * k)
I do imagine your actual callable would look somewhat different since the moment u and v are 2D arrays, the np.cumsum returns an array, while the callable is supposed to produce a scalar. For example:
In [25]: arr1 = np.array([[5, 7], [6, 1]])
In [26]: arr2 = np.array([[6, 7], [6, 1]])
In [28]: def cust_metric(k):
...: return lambda u, v: np.sqrt(np.sum((k*u - v)**2))
...:
In [29]: scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=7))
Out[29]:
array([[51.03920062, 56.08029957],
[36. , 36.49657518]])
In [30]: scipy.spatial.distance.cdist(arr1, arr2, metric=cust_metric(k=1))
Out[30]:
array([[1. , 6.08276253],
[6. , 0. ]])
To get the lowest 10 values of an array X I do something like:
lowest10 = np.argsort(X)[:10]
what is the most efficient way, avoiding loops, to filter the results so that I get the lowest 10 values whose index is not an element of another array Y?
So for example if the array Y is:
[2,20,51]
X[2], X[20] and X[51] shouldn't be taken into consideration to compute the lowest 10.
After some benchmarking here is my humble recommendation:
Swapping out appears to be more or less always faster than masking (even if 99% of X are forbidden.) So use something along the lines of
swap = X[Y]
X[Y] = np.inf
Sorting is expensive, therefore use argpartition and only sort what's necessary. Like
lowest10 = np.argpartition(Xfiltered, 10)[:10]
lowest10 = lowest10[np.argsort(Xfiltered[lowest10])]
Here are some benchmarks:
import numpy as np
from timeit import timeit
def swap_out():
global sol
swap = X[Y]
X[Y] = np.inf
sol = np.argpartition(X, K)[:K]
sol = sol[np.argsort(X[sol])]
X[Y] = swap
def app1():
sidx = X.argsort()
return sidx[~np.in1d(sidx, Y)][:K]
def app2():
sidx = np.argpartition(X,range(K+Y.size))
return sidx[~np.in1d(sidx, Y)][:K]
def app3():
sidx = np.argpartition(X,K+Y.size)
return sidx[~np.in1d(sidx, Y)][:K]
K = 10 # number of small elements wanted
N = 10000 # size of X
M = 10 # size of Y
S = 10 # number of repeats in benchmark
X = np.random.random((N,))
Y = np.random.choice(N, (M,))
so = timeit(swap_out, number=S)
print(sol)
print(X[sol])
d1 = timeit(app1, number=S)
print(sol)
print(X[sol])
d2 = timeit(app2, number=S)
print(sol)
print(X[sol])
d3 = timeit(app3, number=S)
print(sol)
print(X[sol])
print('pp', f'{so:8.5f}', ' d1(um)', f'{d1:8.5f}', ' d2', f'{d2:8.5f}', ' d3', f'{d3:8.5f}')
# pp 0.00053 d1(um) 0.00731 d2 0.00313 d3 0.00149
Here's one approach -
sidx = X.argsort()
idx_out = sidx[~np.in1d(sidx, Y)][:10]
Sample run -
# Setup inputs
In [141]: X = np.random.choice(range(60), 60)
In [142]: Y = np.array([2,20,51])
# For testing, let's set the Y positions as 0s and
# we want to see them skipped in o/p
In [143]: X[Y] = 0
# Use proposed approach
In [144]: sidx = X.argsort()
In [145]: X[sidx[~np.in1d(sidx, Y)][:10]]
Out[145]: array([ 0, 2, 4, 5, 5, 9, 9, 10, 12, 14])
# Print the first 13 numbers and skip three 0s and
# that should match up with the output from proposed approach
In [146]: np.sort(X)[:13]
Out[146]: array([ 0, 0, 0, 0, 2, 4, 5, 5, 9, 9, 10, 12, 14])
Alternatively, for performance, we might want to use np.argpartition, like so -
sidx = np.argpartition(X,range(10+Y.size))
idx_out = X[sidx[~np.in1d(sidx, Y)][:10]]
This would be beneficial if the length of X is a much larger number than 10.
If you don't care about the order of elements in that list of 10 indices, for further boost, we can simply pass on the scalar length instead of range array to np.argpartition : np.argpartition(X,10+Y.size).
We can optimize np.in1d with searchsorted to have one more approach (listing next).
Listing below all the discussed approaches in this post -
def app1(X, Y, n=10):
sidx = X.argsort()
return sidx[~np.in1d(sidx, Y)][:n]
def app2(X, Y, n=10):
sidx = np.argpartition(X,range(n+Y.size))
return sidx[~np.in1d(sidx, Y)][:n]
def app3(X, Y, n=10):
sidx = np.argpartition(X,n+Y.size)
return sidx[~np.in1d(sidx, Y)][:n]
def app4(X, Y, n=10):
n_ext = n+Y.size
sidx = np.argpartition(X,np.arange(n_ext))[:n_ext]
ssidx = sidx.argsort()
mask = np.ones(ssidx.size,dtype=bool)
search_idx = np.searchsorted(sidx, Y, sorter=ssidx)
search_idx[search_idx==sidx.size] = 0
idx = ssidx[search_idx]
mask[idx[sidx[idx] == Y]] = 0
return sidx[mask][:n]
You can work on a subset of original array using numpy.delete();
lowest10 = np.argsort(np.delete(X, Y))[:10]
Since delete works by slicing the original array with indexes to keep, complexity should be constant.
Warning: This solution uses a subset of original X array (X without the elements indexed in Y), thus the end result will be the lowest 10 of that subset.
I have a NumPy array A with shape (m,n) and want to run all the elements through some function f. For a non-constant function such as for example f(x) = x or f(x) = x**2 broadcasting works perfectly fine and returns the expected result. For f(x) = 1, applying the function to my array A however just returns the scalar 1.
Is there a way to force broadcasting to keep the shape, i.e. in this case to return an array of 1s?
F(x) = 1 is not a function you need to create a function with def or lambda and return 1. Then use np.vectorize to apply the function on your array.
>>> import numpy as np
>>> f = lambda x: 1
>>>
>>> f = np.vectorize(f)
>>>
>>> f(np.arange(10).reshape(2, 5))
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
This sounds like a job for np.ones_like, or np.full_like in the general case:
def f(x):
result = np.full_like(x, 1) # or np.full_like(x, 1, dtype=int) if you don't want to
# inherit the dtype of x
if result.shape == 0:
# Return a scalar instead of a 0D array.
return result[()]
else:
return result
Use x.fill(1). Make sure to return it properly as fill doesn't return a new variable, it modifies x
I have a simple function
def square(x, a=1):
return [x**2 + a, 2*x]
I want to minimize it over x, for several parameters a. I currently have loops that, in spirit, do something like this:
In [89]: from scipy import optimize
In [90]: res = optimize.minimize(square, 25, method='BFGS', jac=True)
In [91]: [res.x, res.fun]
Out[91]: [array([ 0.]), 1.0]
In [92]: l = lambda x: square(x, 2)
In [93]: res = optimize.minimize(l, 25, method='BFGS', jac=True)
In [94]: [res.x, res.fun]
Out[94]: [array([ 0.]), 2.0]
Now, the function is already vectorized
In [98]: square(array([2,3]))
Out[98]: [array([ 5, 10]), array([4, 6])]
In [99]: square(array([2,3]), array([2,3]))
Out[99]: [array([ 6, 12]), array([4, 6])]
Which means it would probably be much faster to run all the optimizations in parallel rather than looping. Is that something that's easily do-able with SciPy? Or any other 3rd party tool?
Here's another try, based on my original answer and the discussion that followed.
As far as I know, the scipy.optimize module is for functions with scalar or vector inputs and a scalar output, or "cost".
Since you're treating each equation as independent of the others, my best idea is to use the multiprocessing module to do the work in parallel. If the functions you're minimizing are as simple as the ones in your question, I'd say it's not worth the effort.
If the functions are more complex, and you'd like to divide the work up, try something like:
import numpy as np
from scipy import optimize
from multiprocessing import Pool
def square(x, a=1):
return [np.sum(x**2 + a), 2*x]
def minimize(args):
f,x,a = args
res = optimize.minimize(f, x, method = 'BFGS', jac = True, args = [a])
return res.x
# your a values
a = np.arange(1,11)
# initial guess for all the x values
x = np.empty(len(a))
x[:] = 25
args = [(square,a[i],x[i]) for i in range(10)]
p = Pool(4)
print p.map(minimize,args)
I am a bit late to the party. But this may be interesting for people who want to reduce minimization time by parallel computing:
We implemented a parallel version of scipy.optimize.minimize(method='L-BFGS-B') in the package optimparallel available on PyPI. It can speedup the optimization by evaluating the objective function and the (approximate) gradient in parallel. Here is an example:
from optimparallel import minimize_parallel
def my_square(x, a=1):
return (x - a)**2
minimize_parallel(fun=my_square, x0=1, args=11)
Note that the parallel implementation only reduces the optimization time for objective functions with a long evaluation time (say, longer than 0.1 seconds). Here is an illustration of the possible parallel scaling:
If I understand your intent, you can pass numpy arrays for both x and a, so you can optimize for all your a parameters at once.
Try something like:
def square(x, a=1):
return [np.sum(x**2 + a), 2*x]
# your a values
a = np.arange(1,11)
# initial guess for all the x values
x = np.empty(len(a))
x[:] = 25
# extra arguments to pass to the objective function, in this case, your a values
args = [a]
res = optimize.minimize(square, x, method = 'BFGS', jac = True, args = args)
This appears to be getting the correct results.
>>> res.x
[ -8.88178420e-16 -8.88178420e-16 -8.88178420e-16 -8.88178420e-16
-8.88178420e-16 -8.88178420e-16 -8.88178420e-16 -8.88178420e-16
-8.88178420e-16 -8.88178420e-16]
>>> res.fun
55.0