I would like to improve the speed of my code by computing a function once on a numpy array instead of a for loop is over a function of this python library. If I have a function as following:
import numpy as np
import galsim
from math import *
M200=1e14
conc=6.9
def func(M200, conc):
halo_z=0.2
halo_pos =[1200., 3769.7]
halo_pos = galsim.PositionD(x=halo_pos_arcsec[0],y=halo_pos_arcsec[1])
nfw = galsim.NFWHalo(mass=M200, conc=conc, redshift=halo_z,halo_pos=halo_pos, omega_m = 0.3, omega_lam =0.7)
for i in range(len(shear_z)):
shear_pos=galsim.PositionD(x=pos_arcsec[i,0],y=pos_arcsec[i,1])
model_g1, model_g2 = nfw.getShear(pos=self.shear_pos, z_s=shear_z[i])
l=np.sum(model_g1-model_g2)/sqrt(np.pi)
return l
While pos_arcsec is a two-dimensional array of 24000x2 and shear_z is a 1D array with 24000 elements as well.
The main problem is that I want to calculate this function on a grid where M200=np.arange(13., 16., 0.01) and conc = np.arange(3, 10, 0.01). I don't know how to broadcast this function to be estimated for this two dimensional array over M200 and conc. It takes a lot to run the code. I am looking for the best approaches to speed up these calculations.
This here should work when pos is an array of shape (n,2)
import numpy as np
def f(pos, z):
r=np.sqrt(pos[...,0]**2+pos[...,1]**2)
return np.log(r)*(z+1)
Example:
z = np.arange(10)
pos = np.arange(20).reshape(10,2)
f(pos,z)
# array([ 0. , 2.56494936, 5.5703581 , 8.88530251,
# 12.44183436, 16.1944881 , 20.11171117, 24.17053133,
# 28.35353608, 32.64709419])
Use numpy.linalg.norm
If you have an array:
import numpy as np
import numpy.linalg as la
a = np.array([[3, 4], [5, 12], [7, 24]])
then you can determine the magnitude of the resulting vector (sqrt(a^2 + b^2)) by
b = np.sqrt(la.norm(a, axis=1)
>>> print b
array([ 5., 15. 25.])
Related
I am trying to write a code where I do a summation over a function with two variables, E and n,
where n is a range of 0-5, and plot the result which should be a multiple step function.
import matplotlib.pyplot as plt
import numpy as np
E = np.linspace(0, 10, 10000)
for n in range(0, 6):
h= []
h.append(1/(1 + np.exp(-2*np.pi * (E-(n-0.5)*3))))
print(h)
plt.plot(E,h)
This is the code I have been using. It produces multiple arrays for h:
[array([0.99991931, 0.99991981, 0.99992031, ..., 1. , 1. ,
1. ])]
[array([8.06930057e-05, 8.12016207e-05, 8.17134412e-05, ...,
1.00000000e+00, 1.00000000e+00, 1.00000000e+00])]
[array([5.25548518e-13, 5.28861364e-13, 5.32195094e-13, ...,
1.00000000e+00, 1.00000000e+00, 1.00000000e+00])]
[array([3.42258854e-21, 3.44416317e-21, 3.46587379e-21, ...,
9.99999847e-01, 9.99999848e-01, 9.99999849e-01])]
[array([2.22893072e-29, 2.24298100e-29, 2.25711985e-29, ...,
4.09276641e-02, 4.11750329e-02, 4.14238322e-02])]
[array([1.45157155e-37, 1.46072167e-37, 1.46992947e-37, ...,
2.77912110e-10, 2.79663956e-10, 2.81426846e-10])]
but when I try to plot I get the following error:
ValueError: x and y must have same first dimension, but have shapes (10000,) and (1, 10000)
I don't undestand what is causing this and any help would be appreciated.
I assume what you want is the following:
E = np.expand_dims(np.linspace(0, 10, 10000), 1)
n = np.arange(0, 6)
h = 1/(1 + np.exp(-2*np.pi * (E-(n-0.5)*3)))
plt.plot(E, h)
Note that this vectorized calculation does not require a loop, as Numpy broadcasting figures out automatically how to combine the E and n vectors to a 2D h array, if the input dimension are consistent (note the np.expand_dims). In general, whenever you see the need to use a loop in Numpy, it's good advice to take a step back and think about vectorization.
Fixed version of your original code:
import matplotlib.pyplot as plt
import numpy as np
E = np.linspace(0, 10, 10000)
h= []
for n in range(0, 6):
h.append(1/(1 + np.exp(-2*np.pi * (E-(n-0.5)*3))).T)
plt.plot(E, np.stack(h).T)
Define h before the loop, such that it is not defined as the empty list on each iteration.
Combine the list of 1D arrays into a 6x10_000 2D array using np.stack and transpose (.T) in order to plot all curves.
Is it possible to generate random numbers that are almost equally spaced which shouldnot be exactly same as numpy.linspace output
I look into the numpy.random.uniform function but it doesnot give the required results.
Moreover the the summation of the values generated by the function should be same as the summation of the values generated by numpy.linspace function.
code
import random
import numpy as np
random.seed(42)
data=np.random.uniform(2,4,10)
print(data)
You might consider drawing random samples around the output of numpy.linspace. Setting these numbers as the mean of the normal distribution and setting the variance not too high would generate numbers close to the output of numpy.linspace. For example,
>>> import numpy as np
>>> exact_numbers = np.linspace(2.0, 10.0, num=5)
>>> exact_numbers
array([ 2., 4., 6., 8., 10.])
>>> approximate_numbers = np.random.normal(exact_numbers, np.ones(5) * 0.1)
>>> approximate_numbers
array([2.12950013, 3.9804745 , 5.80670316, 8.07868932, 9.85288221])
Maybe this trick by combining numpy.linspace and numpy.random.uniform and random choice two indexes and increase one of them and decrease other help you:
(You can change size=10, threshold=0.1 for how random numbers are bigger or smaller)
import numpy as np
size = 10
theroshold = 0.1
r = np.linspace(2,4,size) # r.sum()=30
# array([2. , 2.22222222, 2.44444444, 2.66666667, 2.88888889,
# 3.11111111, 3.33333333, 3.55555556, 3.77777778, 4. ])
c = np.random.uniform(0,theroshold,size)
# array([0.02246768, 0.08661081, 0.0932445 , 0.00360563, 0.06539992,
# 0.0107167 , 0.06490493, 0.0558159 , 0.00268924, 0.00070247])
s = np.random.choice(range(size), size+1)
# array([5, 5, 8, 3, 6, 4, 1, 8, 7, 1, 7])
for idx, (i,j) in enumerate(zip(s, s[1:])):
r[i] += c[idx]
r[j] -= c[idx]
print(r)
print(r.sum())
Output:
[2. 2.27442369 2.44444444 2.5770278 2.83420567 3.19772192
3.39512762 3.50172642 3.77532244 4. ]
30
Here's a brief example of a function. It maps a vector to a vector. However, entries that are NaN or inf should be ignored. Currently this looks rather clumsy to me. Do you have any suggestions?
from scipy import stats
import numpy as np
def p(vv):
mask = np.isfinite(vv)
y = np.NaN * vv
v = vv[mask]
y[mask] = 1/v*(stats.hmean(v)/len(v))
return y
You can change the NaN values to zero with Numpy's isnan function and then remove the zeros as follows:
import numpy as np
def p(vv):
# assuming vv is your array
# use Nympy's isnan function to replace the NaN values in the array with zero
replace_NaN = np.isnan(vv)
vv[replace_NaN] = 0
# convert array vv to list
vv_list = vv.tolist()
new_list = []
# loop vv_list and exclude 0 values:
for i in vv_list:
if i != 0:
new.list.append(i)
# set array vv again
vv = np.array(new_list, dtype = 'float64')
return vv
I have came up with this kind of construction:
from scipy import stats
import numpy as np
## operate only on the valid entries of x and use the same mask on the resulting vector y
def __f(func, x):
mask = np.isfinite(x)
y = np.NaN * x
y[mask] = func(x[mask])
return y
# implementation of the parity function
def __pp(x):
return 1/x*(stats.hmean(x)/len(x))
def pp(vv):
return __f(__pp, vv)
Masked arrays accomplish this functionality and allow you to specify the mask as you desire. The numpy 1.18 docs for it are here: https://numpy.org/doc/1.18/reference/maskedarray.generic.html#what-is-a-masked-array
In masked arrays, False mask values are used in calculations, while True are ignored for calculations.
Example for obtaining the mean of only the finite values using np.isfinite():
import numpy as np
# Seeding for reproducing these results
np.random.seed(0)
# Generate random data and add some non-finite values
x = np.random.randint(0, 5, (3, 3)).astype(np.float32)
x[1,2], x[2,1], x[2,2] = np.inf, -np.inf, np.nan
# array([[ 4., 0., 3.],
# [ 3., 3., inf],
# [ 3., -inf, nan]], dtype=float32)
# Make masked array. Note the logical not of isfinite
x_masked = np.ma.masked_array(x, mask=~np.isfinite(x))
# Mean of entire masked matrix
x_masked.mean()
# 2.6666666666666665
# Masked matrix's row means
x_masked.mean(1)
# masked_array(data=[2.3333333333333335, 3.0, 3.0],
# mask=[False, False, False],
# fill_value=1e+20)
# Masked matrix's column means
x_masked.mean(0)
# masked_array(data=[3.3333333333333335, 1.5, 3.0],
# mask=[False, False, False],
# fill_value=1e+20)
Note that scipy.stats.hmean() also works with masked arrays.
Note that if all you care about is detecting NaNs and leaving infs, then you can use np.isnan() instead of np.isfinite().
I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
This function handles the situation where vector v has the norm value of 0.
Is there any similar functions provided in sklearn or numpy?
If you're using scikit-learn you can use sklearn.preprocessing.normalize:
import numpy as np
from sklearn.preprocessing import normalize
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True
I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.
import numpy as np
def normalized(a, axis=-1, order=2):
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
l2[l2==0] = 1
return a / np.expand_dims(l2, axis)
A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))
print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))
This might also work for you
import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))
but fails when v has length 0.
In that case, introducing a small constant to prevent the zero division solves this.
As proposed in the comments one could also use
v/np.linalg.norm(v)
To avoid zero division I use eps, but that's maybe not great.
def normalize(v):
norm=np.linalg.norm(v)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
If you have multidimensional data and want each axis normalized to its max or its sum:
def normalize(_d, to_sum=True, copy=True):
# d is a (n x dimension) np array
d = _d if not copy else np.copy(_d)
d -= np.min(d, axis=0)
d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
return d
Uses numpys peak to peak function.
a = np.random.random((5, 3))
b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1
c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1
If you don't need utmost precision, your function can be reduced to:
v_norm = v / (np.linalg.norm(v) + 1e-16)
You mentioned sci-kit learn, so I want to share another solution.
sci-kit learn MinMaxScaler
In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.
It also deal with NaN issues for us.
NaNs are treated as missing values: disregarded in fit, and maintained
in transform. ... see reference [1]
Code sample
The code is simple, just type
# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)
Reference
[1] sklearn.preprocessing.MinMaxScaler
There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:
import transformations as trafo
import numpy as np
data = np.array([[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
[1.0, 2.0, 3.0]])
print(trafo.unit_vector(data, axis=1))
If you work with multidimensional array following fast solution is possible.
Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.
import numpy as np
arr = np.array([
[1, 2, 3],
[0, 0, 0],
[5, 6, 7]
], dtype=np.float)
lengths = np.linalg.norm(arr, axis=-1)
print(lengths) # [ 3.74165739 0. 10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0. 0. 0. ]
# [0.47673129 0.57207755 0.66742381]]
If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:
import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize
vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()
If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.
import numpy as np
import vg
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True
I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.
Without sklearn and using just numpy.
Just define a function:.
Assuming that the rows are the variables and the columns the samples (axis= 1):
import numpy as np
# Example array
X = np.array([[1,2,3],[4,5,6]])
def stdmtx(X):
means = X.mean(axis =1)
stds = X.std(axis= 1, ddof=1)
X= X - means[:, np.newaxis]
X= X / stds[:, np.newaxis]
return np.nan_to_num(X)
output:
X
array([[1, 2, 3],
[4, 5, 6]])
stdmtx(X)
array([[-1., 0., 1.],
[-1., 0., 1.]])
For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.
a / np.linalg.norm(a, axis=1, keepdims=True)
If you want all values in [0; 1] for 1d-array then just use
(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
Where a is your 1d-array.
An example:
>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])
Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.
A simple dot product would do the job. No need for any extra package.
x = x/np.sqrt(x.dot(x))
By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use
norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x
I want to perform an SVD on a 12*12 matrix. The numpy.linalg.svd works fine. But when I try to get the 12*12 matrix A back by performing u*s*v , i dont get it back.
import cv2
import numpy as np
import scipy as sp
from scipy import linalg, matrix
a_matrix=np.zeros((12,12))
with open('/home/koustav/Documents/ComputerVision/A2/codes/Points0.txt','r') as f:
for (j,line) in enumerate(f):
i=2*j
if(i%2==0):
values=np.array(map(np.double,line.strip('\n').split(' ')))
a_matrix[i,4]=-values[2]
a_matrix[i,5]=-values[3]
a_matrix[i,6]=-values[4]
a_matrix[i,7]=-1
a_matrix[i,8]=values[1]*values[2]
a_matrix[i,9]=values[1]*values[3]
a_matrix[i,10]=values[1]*values[4]
a_matrix[i,11]=values[1]*1
a_matrix[i+1,0]=values[2]
a_matrix[i+1,1]=values[3]
a_matrix[i+1,2]=values[4]
a_matrix[i+1,3]=1
a_matrix[i+1,8]=-values[0]*values[2]
a_matrix[i+1,9]=-values[0]*values[3]
a_matrix[i+1,10]=-values[0]*values[4]
a_matrix[i+1,11]=-values[0]*1
s_matrix=np.zeros((12,12))
u, s, v = np.linalg.svd(a_matrix,full_matrices=1)
k=0
while (k<12):
s_matrix[k,k]=s[k]
k+=1
print u
print '\n'
print s_matrix
print '\n'
print (u*s_matrix*v)
These are the points that i have used:
285.12 14.91 2.06655 -0.807071 -6.06083
243.92 100.51 2.23268 -0.100774 -5.63975
234.7 176.3 2.40898 0.230613 -5.10977
-126.59 -152.59 -1.72487 4.96296 -10.4564
-173.32 -164.64 -2.51852 4.95202 -10.3569
264.81 28.03 2.07303 -0.554853 -6.05747
Please suggest something...
Except from saving some code and time by using built in functions like numpy.diag, your problem seems to be the * operator. In numpy you have to use numpy.dot for matrix multiplication. See the code below for a working example...
In [16]: import numpy as np
In [17]: A = np.arange(15).reshape(5,3)
In [18]: A
Out[18]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [19]: u, s, v = np.linalg.svd(A)
In [20]: S = np.diag(s)
In [21]: S = np.vstack([S, np.zeros((2,3)) ])
In [22]: #fill in zeros to get the right shape
In [23]: np.allclose(A, np.dot(u, np.dot(S,v)))
Out[23]: True
numpy.allclose checks whether two arrays are numerically close...