I need to fit some points from different datasets with straight lines. From every dataset I want to fit a line. So I got the parameters ai and bi that describe the i-line: ai + bi*x. The problem is that I want to impose that every ai are equal because I want the same intercepta. I found a tutorial here: http://www.scipy.org/Cookbook/FittingData#head-a44b49d57cf0165300f765e8f1b011876776502f. The difference is that I don't know a priopri how many dataset I have. My code is this:
from numpy import *
from scipy import optimize
# here I have 3 dataset, but in general I don't know how many dataset are they
ypoints = [array([0, 2.1, 2.4]), # first dataset, 3 points
array([0.1, 2.1, 2.9]), # second dataset
array([-0.1, 1.4])] # only 2 points
xpoints = [array([0, 2, 2.5]), # first dataset
array([0, 2, 3]), # second, also x coordinates are different
array([0, 1.5])] # the first coordinate is always 0
fitfunc = lambda a, b, x: a + b * x
errfunc = lambda p, xs, ys: array([ yi - fitfunc(p[0], p[i+1], xi)
for i, (xi,yi) in enumerate(zip(xs, ys)) ])
p_arrays = [r_[0.]] * len(xpoints)
pinit = r_[[ypoints[0][0]] + p_arrays]
fit_parameters, success = optimize.leastsq(errfunc, pinit, args = (xpoints, ypoints))
I got
Traceback (most recent call last):
File "prova.py", line 19, in <module>
fit_parameters, success = optimize.leastsq(errfunc, pinit, args = (xpoints, ypoints))
File "/usr/lib64/python2.6/site-packages/scipy/optimize/minpack.py", line 266, in leastsq
m = check_func(func,x0,args,n)[0]
File "/usr/lib64/python2.6/site-packages/scipy/optimize/minpack.py", line 12, in check_func
res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
File "prova.py", line 14, in <lambda>
for i, (xi,yi) in enumerate(zip(xs, ys)) ])
ValueError: setting an array element with a sequence.
if you just need a linear fit, then it is better to estimate it with linear regression instead of a non-linear optimizer.
More fit statistics could be obtained be using scikits.statsmodels instead.
import numpy as np
from numpy import array
ypoints = np.r_[array([0, 2.1, 2.4]), # first dataset, 3 points
array([0.1, 2.1, 2.9]), # second dataset
array([-0.1, 1.4])] # only 2 points
xpoints = [array([0, 2, 2.5]), # first dataset
array([0, 2, 3]), # second, also x coordinates are different
array([0, 1.5])] # the first coordinate is always 0
xp = np.hstack(xpoints)
indicator = []
for i,a in enumerate(xpoints):
indicator.extend([i]*len(a))
indicator = np.array(indicator)
x = xp[:,None]*(indicator[:,None]==np.arange(3)).astype(int)
x = np.hstack((np.ones((xp.shape[0],1)),x))
print np.dot(np.linalg.pinv(x), ypoints)
# [ 0.01947973 0.98656987 0.98481549 0.92034684]
The matrix of regressors has a common intercept, but different columns for each dataset:
>>> x
array([[ 1. , 0. , 0. , 0. ],
[ 1. , 2. , 0. , 0. ],
[ 1. , 2.5, 0. , 0. ],
[ 1. , 0. , 0. , 0. ],
[ 1. , 0. , 2. , 0. ],
[ 1. , 0. , 3. , 0. ],
[ 1. , 0. , 0. , 0. ],
[ 1. , 0. , 0. , 1.5]])
(Side note: use def, not lambda assigned to a name -- that's utterly silly and has nothing but downsides, lambda's only use is making anonymous functions!).
Your errfunc should return a sequence (array or otherwise) of floating point numbers, but it's not, because you're trying to put as the items of your arrays the arrays which are the differences each y point (remember, ypoints aka ys is a list of arrays!) and the fit functions' results. So you need to "collapse" the expression yi - fitfunc(p[0], p[i+1], xi) to a single floating point number, e.g. norm(yi - fitfunc(p[0], p[i+1], xi)).
Related
I have ndarray of eigenvalues and their multiplicities (for instance, np.array([(2.2, 2), (3, 3), (5, 1)])). I need to compute Jordan matrix for this eigenvalues without using Python cycles and iterables (list comprehensions, for loops etc.), only by using NumPy's functions.
I decided to build the matrix by this steps:
Create this blocks using np.vectorize and np.eye with np.fill_diagonal:
Combine blocks into one matrix using hstack and vstack.
But I've got two problems:
Here's snippet of my block creating code:
def eye(t):
eye = np.eye(t[1].astype(int),k=1)
return eye
def jordan_matrix(X: np.ndarray) -> np.ndarray:
dim = np.sum(X[:,1].astype(int))
eyes = np.vectorize(eye, signature='(x)->(n,m)')(X)
return eyes
And I'm getting error ValueError: could not broadcast input array from shape (3,3) into shape (2,2)
I need to create extra zero matrices to fill space which is not used by created blocks, but their sizes are variable and I can't figure out how to create them without using Python's for and its equivalents.
Am I on the right way? How can I get out of this problems?
np.vectorize would basically loop under the hoods. We could use NumPy funcs for actual vectorization at Python level. Here's one such way -
def blockwise_jordan(a):
r = a[:,1].astype(int)
v = np.repeat(a[:,0],r)
out = np.diag(v)
n = out.shape[1]
fillvals = np.ones(n, dtype=out.dtype)
fillvals[r[:-1].cumsum()-1] = 0
out.flat[1::out.shape[1]+1] = fillvals
return out
Sample run -
In [52]: X = np.array([(2.2, 2), (3, 3), (5, 1)])
In [53]: blockwise_jordan(X)
Out[53]:
array([[2.2, 1. , 0. , 0. , 0. , 0. ],
[0. , 2.2, 0. , 0. , 0. , 0. ],
[0. , 0. , 3. , 1. , 0. , 0. ],
[0. , 0. , 0. , 3. , 1. , 0. ],
[0. , 0. , 0. , 0. , 3. , 0. ],
[0. , 0. , 0. , 0. , 0. , 5. ]])
Optimization #1
We can replace the final three steps to perform the conditional assignment of 1s and 0s, like so -
out.flat[1::n+1] = 1
c = r[:-1].cumsum()-1
out[c,c+1] = 0
Here's my solution:
def jordan(a):
e = a[:,0] # eigenvalues
m = a[:,1].astype('int') # multiplicities
d = np.repeat(e, m) # main diagonal
ones = np.ones(d.size - 1)
ones[np.cumsum(m)[:-1] -1] = 0
j = np.diag(d) + np.diag(ones, k=1)
return j
Edit: just realized that my solution is almost the same as Divakar's.
Let's say we have a complete graph G with nodes A, B, C which is created by networkx library.
Each node has a coordinate attribute like {x: 2, y: 4}. Currently, the edge weights are 1, but they should be the Euclidean distance between nodes. I can calculate them with for loops but it is extremely inefficient.
So my question is how can I calculate the edge weights in an efficient manner?
Note: I found this but it is an old question.
Edit: I created my network as follows:
# Get a complete graph
rag = nx.complete_graph(L)
if L > 0:
for i, node in enumerate(nodes):
x, y = get_coord() # This function cant be changed
rag.nodes[i]["x"] = x
rag.nodes[i]["y"] = y
If you have the data in advance, we can use numpy and/or pandas to first calculate the distance in bulk, and then load the data into a graph.
Say for instance we can first construct an n×2-matrix with:
import numpy as np
A = np.array([list(get_coord()) for _ in range(L)])
We then can use scipy to calcuate a 2d matrix of distances, for example:
from scipy.spatial.distance import pdist, squareform
B = squareform(pdist(A))
If for instance A is:
>>> A
array([[ 0.16401235, -0.60536247],
[ 0.19705099, 1.74907373],
[ 1.13078545, 2.03750256],
[ 0.52009543, 0.25292921],
[-0.8018697 , -1.45384157],
[-1.37731085, 0.20679761],
[-1.52384856, 0.14468123],
[-0.12788698, 0.22348265],
[-0.27158565, 0.21804304],
[-0.03256846, -2.85381269]])
then B will be:
>>> B
array([[ 0. , 2.354668 , 2.81414033, 0.92922536, 1.28563016,
1.74220584, 1.84700839, 0.8787431 , 0.93152683, 2.25702734],
[ 2.354668 , 0. , 0.97726722, 1.53062279, 3.35507213,
2.20391262, 2.35277933, 1.5598118 , 1.60114811, 4.60861026],
[ 2.81414033, 0.97726722, 0. , 1.88617187, 3.99056885,
3.10516145, 3.26034573, 2.20792312, 2.29718907, 5.02775867],
[ 0.92922536, 1.53062279, 1.88617187, 0. , 2.15885579,
1.897967 , 2.04680841, 0.64865114, 0.79244935, 3.15551623],
[ 1.28563016, 3.35507213, 3.99056885, 2.15885579, 0. ,
1.75751388, 1.7540036 , 1.80766956, 1.75396674, 1.59741777],
[ 1.74220584, 2.20391262, 3.10516145, 1.897967 , 1.75751388,
0. , 0.1591595 , 1.24953527, 1.10578239, 3.34300278],
[ 1.84700839, 2.35277933, 3.26034573, 2.04680841, 1.7540036 ,
0.1591595 , 0. , 1.39818396, 1.25440996, 3.34886281],
[ 0.8787431 , 1.5598118 , 2.20792312, 0.64865114, 1.80766956,
1.24953527, 1.39818396, 0. , 0.14380159, 3.07877122],
[ 0.93152683, 1.60114811, 2.29718907, 0.79244935, 1.75396674,
1.10578239, 1.25440996, 0.14380159, 0. , 3.08114051],
[ 2.25702734, 4.60861026, 5.02775867, 3.15551623, 1.59741777,
3.34300278, 3.34886281, 3.07877122, 3.08114051, 0. ]])
And now we can construct a graph based on that matrix:
G = nx.from_numpy_matrix(B)
now we see that the weights match:
>>> G.get_edge_data(2,5)
{'weight': 3.105161451820312}
Just like the question says, I'm trying to remove all zeros vectors (i.e [0, 0, 0, 0]) from a tensor.
Given:
array([[ 0. , 0. , 0. , 0. ],
[ 0.19999981, 0.5 , 0. , 0. ],
[ 0.4000001 , 0.29999995, 0.10000002, 0. ],
...,
[-0.5999999 , 0. , -0.0999999 , -0.20000005],
[-0.29999971, -0.4000001 , -0.30000019, -0.5 ],
[ 0. , 0. , 0. , 0. ]], dtype=float32)
I had tried the following code (inspired by this SO):
x = tf.placeholder(tf.float32, shape=(10000, 4))
zeros_vector = tf.zeros(shape=(1, 4), dtype=tf.float32)
bool_mask = tf.not_equal(x, zero_vector)
omit_zeros = tf.boolean_mask(x, bool_mask)
But bool_mask seem also to be of shape (10000, 4), like it was comparing every element in the x tensor to zero, and not rows.
I thought about using tf.reduce_sum where an entire row is zero, but that will omit also rows like [1, -1, 0, 0] and I don't want that.
Ideas?
One possible way would be to sum over the absolute values of the row, in this way it will not omit rows like [1, -1, 0, 0] and then compare it with a zero vector. You can do something like this:
intermediate_tensor = reduce_sum(tf.abs(x), 1)
zero_vector = tf.zeros(shape=(1,1), dtype=tf.float32)
bool_mask = tf.not_equal(intermediate_tensor, zero_vector)
omit_zeros = tf.boolean_mask(x, bool_mask)
I tried solution by Rudresh Panchal and it doesn't work for me. Maybe due versions change.
I found tipo in the first row: reduce_sum(tf.abs(x), 1) -> tf.reduce_sum(tf.abs(x), 1).
Also, bool_mask has rank 2 instead of rank 1, which is required:
tensor: N-D tensor.
mask: K-D boolean tensor, K <= N and K must be known statically. In other words, the shape of bool_mask must be for example [6] not [1,6]. tf.squeeze works well to reduce dimension.
Corrected code which works for me:
intermediate_tensor = tf.reduce_sum(tf.abs(x), 1)
zero_vector = tf.zeros(shape=(1,1), dtype=tf.float32)
bool_mask = tf.squeeze(tf.not_equal(intermediate_tensor, zero_vector))
omit_zeros = tf.boolean_mask(x, bool_mask)
Just cast the tensor to tf.bool and use it as a boolean mask:
boolean_mask = tf.cast(x, dtype=tf.bool)
no_zeros = tf.boolean_mask(x, boolean_mask, axis=0)
In general we could have matrices of arbitrary sizes. For my application it is necessary to have square matrix. Also the dummy entries should have a specified value. I am wondering if there is anything built in numpy?
Or the easiest way of doing it
EDIT :
The matrix X is already there and it is not squared. We want to pad the value to make it square. Pad it with the dummy given value. All the original values will stay the same.
Thanks a lot
Building upon the answer by LucasB here is a function which will pad an arbitrary matrix M with a given value val so that it becomes square:
def squarify(M,val):
(a,b)=M.shape
if a>b:
padding=((0,0),(0,a-b))
else:
padding=((0,b-a),(0,0))
return numpy.pad(M,padding,mode='constant',constant_values=val)
Since Numpy 1.7, there's the numpy.pad function. Here's an example:
>>> x = np.random.rand(2,3)
>>> np.pad(x, ((0,1), (0,0)), mode='constant', constant_values=42)
array([[ 0.20687158, 0.21241617, 0.91913572],
[ 0.35815412, 0.08503839, 0.51852029],
[ 42. , 42. , 42. ]])
For a 2D numpy array m it’s straightforward to do this by creating a max(m.shape) x max(m.shape) array of ones p and multiplying this by the desired padding value, before setting the slice of p corresponding to m (i.e. p[0:m.shape[0], 0:m.shape[1]]) to be equal to m.
This leads to the following function, where the first line deals with the possibility that the input has only one dimension (i.e. is an array rather than a matrix):
import numpy as np
def pad_to_square(a, pad_value=0):
m = a.reshape((a.shape[0], -1))
padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype)
padded[0:m.shape[0], 0:m.shape[1]] = m
return padded
So, for example:
>>> r1 = np.random.rand(3, 5)
>>> r1
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468]])
>>> pad_to_square(r1, 3)
array([[ 0.85950957, 0.92468279, 0.93643261, 0.82723889, 0.54501699],
[ 0.05921614, 0.94946809, 0.26500925, 0.02287463, 0.04511802],
[ 0.99647148, 0.6926722 , 0.70148198, 0.39861487, 0.86772468],
[ 3. , 3. , 3. , 3. , 3. ],
[ 3. , 3. , 3. , 3. , 3. ]])
or
>>> r2=np.random.rand(4)
>>> r2
array([ 0.10307689, 0.83912888, 0.13105124, 0.09897586])
>>> pad_to_square(r2, 0)
array([[ 0.10307689, 0. , 0. , 0. ],
[ 0.83912888, 0. , 0. , 0. ],
[ 0.13105124, 0. , 0. , 0. ],
[ 0.09897586, 0. , 0. , 0. ]])
etc.
I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).
I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?
Thanks!
EDIT:
Based on the chosen answer plus the comment from mtrw, I have the following function:
def xcorr(x):
"""FFT based autocorrelation function, which is faster than numpy.correlate"""
# x is supposed to be an array of sequences, of shape (totalelements, length)
fftx = fft(x, n=(length*2-1), axis=1)
ret = ifft(fftx * np.conjugate(fftx), axis=1)
ret = fftshift(ret, axes=1)
return ret
Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.
Using FFT-based autocorrelation:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[ 14. 8. 6. 8.]
## [ 126. 120. 118. 120.]
## [ 366. 360. 358. 360.]
## [ 734. 728. 726. 728.]
## [ 1230. 1224. 1222. 1224.]]
I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.
EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[ 0. 1. 2. 3. 0. 0. 0. 0.]
## [ 4. 5. 6. 7. 0. 0. 0. 0.]
## [ 8. 9. 10. 11. 0. 0. 0. 0.]
## [ 12. 13. 14. 15. 0. 0. 0. 0.]
## [ 16. 17. 18. 19. 0. 0. 0. 0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[ 14. 8. 3. 0. 0. 3. 8.]
## [ 126. 92. 59. 28. 28. 59. 92.]
## [ 366. 272. 179. 88. 88. 179. 272.]
## [ 734. 548. 363. 180. 180. 363. 548.]
## [ 1230. 920. 611. 304. 304. 611. 920.]]
There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.
For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:
def xcorr(x):
l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
fftx = fft(x, n = l, axis = 1)
ret = ifft(fftx * np.conjugate(fftx), axis = 1)
ret = fftshift(ret, axes=1)
return ret
This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.
Maybe it's just a preference, but I wanted to follow from the definition. I personally find it a bit easier to follow that way. This is my implementation for an arbitrary nd array.
from itertools import product
from numpy import empty, roll
def autocorrelate(x):
"""
Compute the multidimensional autocorrelation of an nd array.
input: an nd array of floats
output: an nd array of autocorrelations
"""
# used for transposes
t = roll(range(x.ndim), 1)
# pairs of indexes
# the first is for the autocorrelation array
# the second is the shift
ii = [list(enumerate(range(1, s - 1))) for s in x.shape]
# initialize the resulting autocorrelation array
acor = empty(shape=[len(s0) for s0 in ii])
# iterate over all combinations of directional shifts
for i in product(*ii):
# extract the indexes for
# the autocorrelation array
# and original array respectively
i1, i2 = asarray(i).T
x1 = x.copy()
x2 = x.copy()
for i0 in i2:
# clip the unshifted array at the end
x1 = x1[:-i0]
# and the shifted array at the beginning
x2 = x2[i0:]
# prepare to do the same for
# the next axis
x1 = x1.transpose(t)
x2 = x2.transpose(t)
# normalize shifted and unshifted arrays
x1 -= x1.mean()
x1 /= x1.std()
x2 -= x2.mean()
x2 /= x2.std()
# compute the autocorrelation directly
# from the definition
acor[tuple(i1)] = (x1 * x2).mean()
return acor