Correlation of columns of two arrays in python - python

I have two arrays: 900x421 and 900x147. I need to correlate all columns from these arrays so that the output is 421x147. In Matlab function corr() does it, but I can't find a function that does the same in python.

the numpy.corrcoef function is the way to go. You need both arguments x and y to be of the same shape. You can do so by concatenate the two arrays. Let's say arr1 is of shape 900x421 and arr2 is of shape 900x147. You can do the following
import numpy as np
two_arrays = np.concatenate((arr1, arr2), axis=1) # 900x568
corr = np.corrcoef(two_arrays.T) # 568x568 array
desired_output = corr[0:421, 421:]
The np.corrcoef treats each row as a variable and each column as observation. That is why we need to transpose the array.

Related

How can I put two NumPy arrays into a matrix with two columns?

I am trying to put two NumPy arrays into a matrix or horizontally stack them. Each array is 76 elements long, and I want the ending matrix to have 76 rows and 2 columns. I basically have a velocity/frequency model and want to have two columns with corresponding frequency/velocity values in each row.
Here is my code ('f' is frequency and 'v' the velocity values, previously already defined):
print(f.shape)
print(v.shape)
print(type(f))
print(type(v))
x = np.concatenate((f, v), axis = 1)
This returns
(76,)
(76,)
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
And an error about the concatenate line that says:
AxisError: axis 1 is out of bounds for array of dimension 1
I've also tried hstack except for concatenate, as well as vstack and transposing .T, and have the same error. I've also tried using Pandas, but I need to use NumPy, because when I save it into a txt/dat file, Pandas gives me an extra column with numbering that I do not need to have.
Your problem is that your vectors are one-dimensional, like in this example:
f_1d = np.array([1,2,3,4])
print(f_1d.shape)
> (4,)
As you can see, only the first dimension is given. So instead you could create your vectors like this:
f = np.expand_dims(np.array([1,2,3,4]), axis=1)
v = np.expand_dims(np.array([5,6,7,8]), axis=1)
print(f.shape)
print(v.shape)
>(4,1)
>(4,1)
As you may notice, the second dimension is equal to one, but now your vector is represented in matrix form.
It is now possible to transpose the matrix-vectors:
f_t = f.T
v_t = v.T
print(f_t)
> (1,4)
Instead of using concatenate, you could use vstack or hstack to create cleaner code:
x = np.hstack((f,v))
x_t = np.vstack((f_t,v_t))
print(x.shape)
print(x_t.shape)
>(4,2)
>(2,4)

Apply Box Cox transformation to two columns simultaneously

I want to apply a Box-Cox transformation to two different columns. The twist is that I'm being asked to choose the lambda that's optimal for both columns simultaneously.
scipy.stats.boxcox only accepts one-dimensional arrays.
How can I apply a Box-Cox transformation to two columns subject to lambda_1 = lambda_2?
Here's my data.
I would like to transform the columns SPEED and CAP.
import pandas as pd
from scipy import stats
df = pd.read_csv('https://raw.githubusercontent.com/BenjaminKay/berndt-econometrics/master/data/floppy_ver/CHAP4.DAT/COLE',
sep='\t')
stats.boxcox(df[['SPEED','CAP']].values)
ValueError: Data must be 1-dimensional.
It sounds like you want boxcox to treat the two columns as a single data set. You could merge them into a single 1-d array, apply boxcox, and then restore the shape afterwards, as in the following.
Get the values as a 2-d array:
In [63]: data = df[['SPEED','CAP']].values
Pass the data to boxcox; use the .ravel() method to flatten data into a 1-d array before passing in the data:
In [64]: result1d, lam = stats.boxcox(data.ravel())
In [65]: lam
Out[65]: -0.02063317824310837
Reshape result1d back to the original 2-d shape:
In [66]: result = result1d.reshape(data.shape)
In [67]: result.shape
Out[67]: (91, 2)
In [68]: result[:8]
Out[68]:
array([[-1.82384013, 7.23194418],
[-4.09393704, 3.25939313],
[-3.80017243, 4.39314839],
[-3.80017243, 4.39314839],
[-3.80017243, 4.39314839],
[-3.80017243, 4.39314839],
[-3.11153324, 5.01897958],
[-3.11153324, 5.01897958]])

Iterating through rows in python

I have a (68x2) matrix named shape and I am trying to iterate through all the 68 rows by placing column 0 and column 1 of shape in array B. This is then multiplied by a (3x3) transformation matrix A. Then my intent was to create a single array (which is why I used np.append) but actually all I am getting are 68 singular 2 dimensional matrices and I do not know why.
Here is my code:
import numpy as np
for row in shape:
B = np.array([[row[0]],[row[1]],[1]])
result = np.matmul(A,B)
result = np.append(result[0], result[1], axis = 0)
print(result)
Anyone know how I can fix my problem?
You can concatenate a new column onto your shape array and then multiply all your rows by the transform matrix at once using a single matrix multiplication.
result = (np.concatenate((shape, np.ones((68, 1))), axis=1) # A)[:,:2]
It's possible you need to multiply by the transpose of the transformation matrix, A.T, rather than by A itself.

add column Numpy array python

I am very new to python and am very familiar with R, but my question is very simple using Numpy Arrays:
Observe:
I have one array X of dimension (100,2) of floating point type and I want to add a 3rd column, preferably into a new Numpy array of dimension (100,3) such that the 3rd column = col(1)^2 for every row in array of X.
My understanding is Numpy arrays are generally of fixed dimension so I'm OK with creating a new array of dim 100x3, I just don't know how to do so using Numpy arrays.
Thanks!
One way to do this is by creating a new array and then concatenating it. For instance, say that M is currently your array.
You can compute col(1)^2 as C = M[:,0] ** 2 (which I'm interpreting as column 1 squared, not column 1 to the power of the values in column two). C will now be an array with shape (100, ), so we can reshape it using C = np.expand_dims(C, 1) which will create a new axis of length 1, so our new column now has shape (100, 1). This is important because we want all both of our arrays to have the same number of dimensions when concatenating them.
The last step here is to concatenate them using np.concatenate. In total, our result looks like this
C = M[:, 0] ** 2
C = np.expand_dims(C, 1)
M = np.concatenate([M, C], axis=1) #third row will now be col(1) ^ 2
If you're the kind of person who likes to do things in one line, you have:
M = np.concatenate([M, np.expand_dims(M[:, 0] ** 2, 0)], axis=1)
That being said, I would recommend looking at Pandas, it supports these actions more naturally, in my opinion. In Pandas, it would be
M["your_col_3_name"] = M["your_col_1_name"] ** 2
where M is a pandas dataframe.
Append with axis=1 should work.
a = np.zeros((5,2))
b = np.ones((5,1))
print(np.append(a,b,axis=1))
This should return:
[[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1],
[0,0,1]]
# generate an array with shape (100,2), fill with 2.
a = np.full((100,2),2)
# calcuate the square to first column, this will be a 1-d array.
squared=a[:,0]**2
# concatenate the 1-d array to a,
# first need to convert it to 2-d arry with shape (100,1) by reshape(-1,1)
c = np.concatenate((a,squared.reshape(-1,1)),axis=1)

numpy representing array of matrices

I have an array of 2x2 complex matrices that represents a transformation of a scattering matrix over time. For my calculations I need a way to multiply such arrays between themselves (matrix multiplication); multiply each matrix in the array by another matrix; apply a transformation to all matrices in the array.
I've tried multiple ways of doing so with the numpy (4 column array, array of arrays, array of matrices, list of matrices), but each of them, while providing a nice interface for some of the required functions, makes the rest very awkward.
So here's the question - what is the best way to represent such structures and how would I carry out the required transformations over them?
examples
Initially data is in csv file:
import numpy as np
csv = np.arange(45.).reshape(5,9)
t = np.array(csv[:,0]) # time array
4 column array
transform csv to 4 column array:
data = np.apply_along_axis(lambda x: [x[1]+1j*x[2],
x[3]+1j*x[4],
x[5]+1j*x[6],
x[7]+1j*x[8]],1,csv)
array x matrix:
m = np.array([[1,0],[0,0]])
np.apply_along_axis(lambda x: (x.reshape(2,2).dot(m)).reshape(1,4),1,data)
array x array:
would probably require a for loop and array preallocation
transformation:
np.apply_along_axis(lambda x: [-(x[0]*x[3]-x[1]*x[2])/x[2],
x[0]/x[2],
-x[3]/x[2],
1/x[2]],1,data)
list of arrays
transform csv to list of arrays:
data = [np.array([[i[1]+1j*i[2],
i[3]+1j*i[4]],
[i[5]+1j*i[6],
i[7]+1j*i[8]]]) for i in csv]
array x matrix:
m = np.array([[1,0],[0,0]])
[i.dot(m) for i in data]
array x array:
[data[i].dot(data[i]) for i in range(len(data))]
transformation:
[np.array([[-(np.linalg.det(x))/x[0,1],
x[0,0]/x[1,0]],
[-x[1,1]/x[0,1],
1/x[0,1]]]) for x in data]
array of matrices
transform csv to array of matrices:
data = np.apply_along_axis(lambda x: [[x[1]+1j*x[2],
x[3]+1j*x[4]],
[x[5]+1j*x[6],
x[7]+1j*x[8]]],1,csv)
array x matrix:
m = np.array([[1,0],[0,0]])
data.dot(m)
array x array:
would probably require a for loop and array preallocation
data * data # not a dot product
transformation:
would probably require a for loop and array preallocation

Categories

Resources