Multiple least square linear regressions in one command? - python

Say I have 10 4*4 numpy arrays:
[[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]]
[[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5]]
etc...
What I want to do is calculate a least squares linear regression for each entry in the matrix.
So I want to take m0[0][0], m1[0][0], m2[0][0], etc... and calculate the linear regression. Then do the same for the [0][1] values.
Is there any way of doing this without having to first extract all [0][0] values into a new array and calling numpy.linalg.lstsq? Can I somehow pass my 10*4*4 array to numpy.linalg.lstsq so that it will calculate multiple regressions?

Give this a shot... I'm sure there is a way to make this more efficient though.
def my_lin_reg(arr0):
n = arr0.shape[0]
s = arr0.shape[1] * arr0.shape[2]
arr1 = arr0.swapaxes(0, 2).reshape(s, n)
x = np.vstack([range(n), np.ones(n)]).T
mc = []
for sub_arr in arr1:
mc.append(np.linalg.lstsq(x, sub_arr)[0])
return np.array(mc)

Related

Are the elements created by numpy.repeat() views of the original numpy.array or unique elements?

I have a 3D array that I like to repeat 4 times.
Achieved via a mixture of Numpy and Python methods:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = []
>>> for i in range(4):
z2.append(z)
>>> z2
[array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])]
>>> z2 = np.array(z2)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Achieved via Pure NumPy:
>>> z2 = np.repeat(z[np.newaxis,...], 4, axis=0)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Are the elements created by numpy.repeat() views of the original numpy.array() or unique elements?
If the latter, is there an equivalent NumPy functions that can create views of the original array the same way as numpy.repeat()?
I think such an ability can help reduce the buffer space of z2 in the event size of z is large and when there are many repeats of z involved.
A follow-up on one of #FrankYellin answer:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = np.repeat(z[np.newaxis,...], 1_000_000_000, axis=0)
>>> z2.nbytes
72000000000
>>> y2 = np.broadcast_to(z, (1_000_000_000, 3, 3))
>>> y2.nbytes
72000000000
The nbytes from using np.broadcast_to() is the same as np.repeat(). This is surprising given that the former returns a readonly view on the original z array with the given shape. Having said this, I did notice that np.broadcast_to() created the y2 array instantaneously, while the creation of z2 via np.repeat() took abt 40 seconds to complete. Hence,np.broadcast_to() yielded significantly faster performance.
If you want a writable version, it is doable, but it's really ugly.
If you want a read-only version, np.broadcast_to(z, (4, 3, 3)) should be all you need.
Now the ugly writable version. Be careful. You can corrupt memory if you mess the arguments up.
> z.shape
(3, 3)
> z.strides
(24, 8)
from numpy.lib.stride_tricks import as_strided
z2 = as_strided(z, shape=(4, 3, 3), strides=(0, 24, 8))
and you end up with:
>>> z2[1, 1, 1]
4
>>> z2[1, 1, 1] = 100
>>> z2[2, 1, 1]
100
>>>
You are using strides to say that I want to create a second array overlayed on top of the first array. You set its new shape, and you prefix 0 to the previous stride, indicating that the first dimension has no effect on the data you want.
Make sure you understand strides.
numpy.repeat creates a new array and not a view (you can check it by looking the __array_interface__ field). In fact, it is not possible to create a view on the original array in the general case since Numpy views does not support such pattern. A views is basically just an object containing a pointer to a raw memory buffer, some strides, a shape and a type. While it is possible to repeat one item N times with a 0 stride, it is not possible to repeat 2 items N times (without adding a new dimension to the output array). Thus, no there is no way to build a function like numpy.repeat having the same array output shape to repeat items of the last axis. If adding a new dimension is Ok, then you can build an array with a new dimension and a stride set to 0. Repeating the last dimension is possible though. The answer of #FrankYellin gives a good example. Note that reshaping/ravel the resulting array cause a mandatory copy. Supporting such advanced views would make the Numpy code more complex or/and less efficient for a feature that is only used rarely by users.

numpy: floor values of array to different array of values (similar to np.floor)

It is kind of hard to explain exactly what I mean, therefore I give an example of the function I would like:
a = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
b = [0, 4, 5]
c = np.func(a, b)
print(c)
-->[[0, 0, 0],
[4, 5, 5],
[5, 5, 5]]
In words: every element of an array (a) should be lowered to the closest lower value of another array (b). The values could be floats.
I could do this in a loop, but I'm sure there is a numpy way of doing this.
Any tips much appreciated.

What is the correct way to format the parameters for DTW in Similarity Measures?

I am trying to use the DTW algorithm from the Similarity Measures library. However, I get hit with an error that states a 2-Dimensional Array is required. I am not sure I understand how to properly format the data, and the documentation is leaving me scratching my head.
https://github.com/cjekel/similarity_measures/blob/master/docs/similaritymeasures.html
According to the documentation the function takes two arguments (exp_data and num_data ) for the data set, which makes sense. What doesn't make sense to me is:
exp_data : array_like
Curve from your experimental data. exp_data is of (M, N) shape, where
M is the number of data points, and N is the number of dimensions
This is the same for both the exp_data and num_data arguments.
So, for further clarification, let's say I am implementing the fastdtw library. It looks like this:
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean
x = np.array([1, 2, 3, 3, 7])
y = np.array([1, 2, 2, 2, 2, 2, 2, 4])
distance, path = fastdtw(x, y, dist=euclidean)
print(distance)
print(path)
Or I can implement the same code with dtaidistance:
from dtaidistance import dtw
x = [1, 2, 3, 3, 7]
y = [1, 2, 2, 2, 2, 2, 2, 4]
distance = dtw.distance(x, y)
print(distance)
However, using this same code with Similarity Measures results in an error. For example:
import similaritymeasures
import numpy as np
x = np.array([1, 2, 3, 3, 7])
y = np.array([1, 2, 2, 2, 2, 2, 2, 4])
dtw, d = similaritymeasures.dtw(x, y)
print(dtw)
print(d)
So, my question is why is a 2-Dimensional Array required here? What is similarity measures doing that the other libraries are not?
And if Similarity measures requires data of (M, N) shape, where M is the number of data points, and N is the number of dimensions, then where does my data go? Or, phrased differently, M is the number of data points, so in the above examples x has 5 data points. And N is the number of dimensions, and in the above examples x has one dimension. So am I passing it [5, 1]? This doesn't seem right for obvious reasons, but I can't find any sample code that makes this any clearer.
My reason for wanting to use similaritymeasures is that it has multiple other functions that I would like to leverage, such as Fretchet Distance and Hausdorff distance. I'd really like to understand how to utilize it.
I really appreciate any help.
It appears the solution in my case was to include the index in the array. For example, if your data looks like this:
x = [1, 2, 3, 3, 7]
y = [1, 2, 2, 2, 2, 2, 2, 4]
It needs to look like this:
x = [[1, 1], [2, 2], [3, 3], [4, 3], [5, 7]]
y = [[1, 1], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2], [7, 2], [8, 4]]
In my case, x and y were two separate columns in a pandas dataframe. My solution was as follows:
df['index'] = df.index
x1 = df['index']
y1 = df['column1']
P = np.array([x1, y1]).T
x2 = df['index']
y2 = df['column2']
Q = np.array([x2, y2]).T
dtw, d = similaritymeasures.dtw(P, Q)
print(dtw)

How to rotate a matrix (nested list) counter clockwise by 90 degrees

I'm trying to rotate a matrix counter clockwise by 90 degrees.
For example, if:
m = [[1,2,3],
[2,3,3],
[5,4,3]]
then the result should be
m = [[3,3,3],
[2,3,4],
[1,2,5]]
So far, I found:
rez = [[m[j][i] for j in range(len(m))] for i in range(len(m[0]))]
for row in rez:
print(row)
This gives me
[1, 2, 5]
[2, 3, 4]
[3, 3, 3]
This is close, but the rows would need to be reverses. Does anyone know a simple way to rotate this matrix counter clockwise by 90 degrees?
You could do the following:
m = [[1, 2, 3],
[2, 3, 3],
[5, 4, 3]]
result = list(map(list, zip(*m)))[::-1]
print(result)
Output
[[3, 3, 3],
[2, 3, 4],
[1, 2, 5]]
With map(list, zip(*m)) you create an iterable of the columns, and with the expression list(...)[::-1] you convert that iterable into a list and reverse it.
What you here basically do is map a matrix A to a matrix B such that:
Bi j=Aj i
In case you rotate elements, that means that if you rotate an n×m-matrix, then that means that:
Bi j=Aj n-i
So we can calculate this as:
rez = [[m[j][ni] for j in range(len(m))] for ni in range(len(m[0])-1, -1, -1)]
which is thus the transpose, but than "reversed". Using indices is however typically not how you do such processing in Python, since now it works only for items that are subscriptable, so I advice you to look for a more elegant solution.
But that being said, numpy offers a numpy.rot90 function to rotate matrices:
>>> np.rot90(m)
array([[3, 3, 3],
[2, 3, 4],
[1, 2, 5]])
Other option is to use scipy.ndimage.rotate
Rotate an array.
The array is rotated in the plane defined by the two axes given by the
axes parameter using spline interpolation of the requested order.
import numpy as np
from scipy import ndimage
m = np.matrix([[1,2,3],
[2,3,3],
[5,4,3]])
ndimage.rotate(m, 90.0) #angle as float.
Out:
array([[3, 3, 3],
[2, 3, 4],
[1, 2, 5]])
Same result you can get by using the zip() function to transpose rows and columns of a 5.1.4. Nested List then reverse the nested list with [::-1] + put in a np.matrix :
matrix = [[1, 2, 3],
[2, 3, 3],
[5, 4, 3]]
np.matrix(list(zip(*matrix)))[::-1]
Out:
matrix([[3, 3, 3],
[2, 3, 4],
[1, 2, 5]])

NxN matrix permutations [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have this homework to do for my python class at Uni and I can't get my head around it. Basically I need to write a program that returns an NxN array( preferably using numpy in such a scenario:
123456
212345
321234
432123
543212
654321
I've attampted to create a simple 6x6 array for example
X = np.full((n,n),np.arange(1,+n+1))
which returns
123456
123456
123456
123456
123456
123456
But these are simply permutations of switching the last element with the first and "pushing the ones in the middle to the right but as mentioned earlier its more complex. Thanks in advance.
Can you come up with a formula for X[i,j] in terms of i and j? (I can, but this is homework!)
If so, you can do:
is, js = np.indices((n,n))
X = your_formula(is, js)
For example, if you wanted X[i,j] = i + j, you could do
is, js = np.indices((n,n))
X = is + js
Which for n=3 would give
012
123
234
If you're writing loops to solve array problems, you're usually doing it wrong. Just use an upper triangular matrix.
from scipy.linalg import circulant
import numpy as np
>>> arr = circulant(np.arange(1,7)).T
>>> np.triu(arr, 1).T + np.triu(arr)
array([[1, 2, 3, 4, 5, 6],
[2, 1, 2, 3, 4, 5],
[3, 2, 1, 2, 3, 4],
[4, 3, 2, 1, 2, 3],
[5, 4, 3, 2, 1, 2],
[6, 5, 4, 3, 2, 1]])
As this is a homework, I will not give you the complete answer, but I'll give you a tip enough to get you going.
As you have all those numbers, consider putting them into an np.array that way you could use np.reshape(verctor, [lines, columns]). Here's the link to the Numpy documentation.
This is probably not an answer that you can use for your homework, but...
In [619]: from scipy.linalg import toeplitz
In [620]: toeplitz(range(1, 7))
Out[620]:
array([[1, 2, 3, 4, 5, 6],
[2, 1, 2, 3, 4, 5],
[3, 2, 1, 2, 3, 4],
[4, 3, 2, 1, 2, 3],
[5, 4, 3, 2, 1, 2],
[6, 5, 4, 3, 2, 1]])
See the documentation for scipy.linalg.toeplitz for more information.
There are several options to do that:
Perhaps the most straightforward approach would be that of using slicings and list comprehensions
def myfunc(n):
x = range(1, n+1)
return np.asarray([x[1:i+1][::-1] + x[:n-i] for i in range(n)])
You could also use Matlab-like functions diag and transpose
def myfunc(n):
x = np.diag(np.ones(n))
for i in range(2, n+1):
d = np.diag(i*np.ones(n-i+1), i-1)
x += d + d.T
return x
Loop-based solutions - as #GoBrewers14 correctly pointed out - are inefficient for creating large arrays though. In this case you should utilize a vectorized algorithm instead. If you don't wish to employ the SciPy's toeplitz and circulant functions suggested in other answers, broadcasting is your friend.
def myfunc(n):
x = np.arange(n).reshape((1, n))
return np.abs(x - x.T) + 1
And this is what you get by running any of the implementations above:
>>> myfunc(6)
array([[1, 2, 3, 4, 5, 6],
[2, 1, 2, 3, 4, 5],
[3, 2, 1, 2, 3, 4],
[4, 3, 2, 1, 2, 3],
[5, 4, 3, 2, 1, 2],
[6, 5, 4, 3, 2, 1]])
Hope this helps

Categories

Resources