How to concatenate two vectors of different dimensions - python

If X = [[ 1. 1. 1. 1. 1. 1.]] and Y = [[ 0. 0. 0. 0.]] - how can I concatenate the two vectors to form a single vector along column?
I did the following but it didn't work:
import tensorflow as tf
X = tf.constant(1.0, shape=[1, 6])
Y = tf.zeros(shape=[1,4])
XY = tf.concat((X,Y), axis = 0)
sess = tf.Session()
print(sess.run(XY))

If you want to concat them on axis 0, then their size must be equal
Assuming that you don't want it,
you need to set axis = 1 in the tf.concat method

Related

Why np.corrcoef is not working as expected with two vectors of dimension two?

I am trying to calculate the Pearson correlation coefficient between two vectors in 2-dimensions using np.corrcoef. When the dimension of the vectors is different than two, they work fine, see for example:
import numpy as np
x = np.random.uniform(-10, 10, 3)
y = np.random.uniform(-10, 10, 3)
print(x, y)
print(np.corrcoef(x,y))
Output:
[-6.59840638 -1.81100446 5.6158669 ] [ 6.7200348 -7.0373677 -2.11395157]
[[ 1. -0.53299763]
[-0.53299763 1. ]]
However, when the dimension is exactly two, the correlation is wrong with the only values 1 or -1:
import numpy as np
x = np.random.uniform(-10, 10, 2)
y = np.random.uniform(-10, 10, 2)
print(x, y)
print(np.corrcoef(x,y))
Output 1:
[-2.61268708 8.32602293] [6.42020314 3.43806504]
[[ 1. -1.]
[-1. 1.]]
Output 2:
[ 5.04249697 -3.6599369 ] [6.12936665 3.15827974]
[[1. 1.]
[1. 1.]]
Output 3:
[7.33503682 7.7145613 ] [-9.54304108 7.43840944]
[[1. 1.]
[1. 1.]]
Question: What's happening and how to solve it?
There are a couple misunderstandings leading to your confusion:
I'll use row major order as numpy "Each row of x represents a variable, and each column a single observation of all those variables."
The Pearson correlation coefficient describes the linear relationship between 2 variables. If you only have 2 values point for each. You can always create a linear relationship between the 2. With the normalization, you'll always get 1 or -1.
A covariance or correlation matrix is usually calculated amongst the components of a random vector X=(X1,....,Xn).T . When you say you want the correlation between 2 vectors, it is unclear whether you want the cross-correlation between X an Y in which case you need np.correlate.

Multiply same numpy array with scalars multiple times

I have a 3D NumPy array of size (9,9,200) and a 2D array of size (200,200).
I want to take each channel of shape (9,9,1) and generate an array (9,9,200), every channel multiplied 200 times by 1 scalar in a single row, and average it such that the resultant array is (9,9,1).
Basically, if there are n channels in an input array, I want each channel multiplied n times and averaged - and this should happen for all channels. Is there an efficient way to do so?
So far what I have is this -
import numpy as np
arr = np.random.rand(9,9,200)
nchannel = arr.shape[-1]
transform = np.array([np.random.uniform(low=0.0, high=1.0, size=(nchannel,)) for i in range(nchannel)])
for channel in range(nchannel):
# The below line needs optimization
temp = [arr[:,:,i] * transform[channel][i] for i in range(nchannel)]
arr[:,:,channel] = np.sum(temp, axis=0)/nchannel
Edit :
A sample image demonstrating what I am looking for. Here nchannel = 3.
The input image is arr. The final image is the transformed arr.
EDIT:
import numpy as np
n_channels = 3
scalar_size = 2
t = np.ones((n_channels,scalar_size,scalar_size)) # scalar array
m = np.random.random((n_channels,n_channels)) # letters array
print(m)
print(t)
m_av = np.mean(m, axis=1)
print(m_av)
for i in range(n_channels):
t[i] = t[i]*m_av1[i]
print(t)
output:
[[0.04601533 0.05851365 0.03893352]
[0.7954655 0.08505869 0.83033369]
[0.59557455 0.09632997 0.63723506]]
[[[1. 1.]
[1. 1.]]
[[1. 1.]
[1. 1.]]
[[1. 1.]
[1. 1.]]]
[0.04782083 0.57028596 0.44304653]
[[[0.04782083 0.04782083]
[0.04782083 0.04782083]]
[[0.57028596 0.57028596]
[0.57028596 0.57028596]]
[[0.44304653 0.44304653]
[0.44304653 0.44304653]]]
What you're asking for is a simple matrix multiplication along the last axis:
import numpy as np
arr = np.random.rand(9,9,200)
transform = np.random.uniform(size=(200, 200)) / 200
arr = arr # transform

Convert probability vector into target vector in python?

I am doing logistic regression on iris dataset from sklearn, I know the math and try to implement it. At the final step, I get a prediction vector, this prediction vector represents the probability of that data point being to class 1 or class 2 (binary classification).
Now I want to turn this prediction vector into target vector. Say if probability is greater than 50%, that corresponding data point will belong to class 1, otherwise class 2. Use 0 to represent class 1, 1 for class 2.
I know there is a for loop version of it, just looping through the whole vector. But when the size get large, for loop is very expensive, so I want to do it more efficiently, like numpy's matrix operation, it is faster than doing matrix operation in for loop.
Any suggestion on the faster method?
import numpy as np
a = np.matrix('0.1 0.82')
print(a)
a[a > 0.5] = 1
a[a <= 0.5] = 0
print(a)
Output:
[[ 0.1 0.82]]
[[ 0. 1.]]
Update:
import numpy as np
a = np.matrix('0.1 0.82')
print(a)
a = np.where(a > 0.5, 1, 0)
print(a)
A more general solution to a 2D array which has many vectors with many classes:
import numpy as np
a = np.array( [ [.5, .3, .2],
[.1, .2, .7],
[ 1, 0, 0] ] )
idx = np.argmax(a, axis=-1)
a = np.zeros( a.shape )
a[ np.arange(a.shape[0]), idx] = 1
print(a)
Output:
[[1. 0. 0.]
[0. 0. 1.]
[1. 0. 0.]]
Option 1: If you do binary classification and have 1d prediction vector then your solution is numpy.round:
prob = model.predict(X_test)
Y = np.round(prob)
Option 2: If you have an n-dimensional one-hot prediction matrix, but want to have labels then you can use numpy.argmax. This will return 1d vector with labels:
prob = model.predict(X_test)
y = np.argmax(prob, axis=1)
In case you want to procede with a confusion matrix etc. afterwards and get the original format of a target variable in scikit again: array([1 0 ... 1])you can use:
a = clf.predict_proba(X_test)[:,1]
a = np.where(a>0.5, 1, 0)
The [:,1] referes to the second class (in my case: 1), the first class in my case was 0
for multi class, or a more generalized solution, use
np.argmax(y_hat, axis=1)

How to change chunks of data in a numpy array

I have a large numpy 1 dimensional array of data in Python and want entries x (500) to y (520) to be changed to equal 1. I could use a for loop but is there a neater, faster numpy way of doing this?
for x in range(500,520)
numpyArray[x] = 1.
Here is the for loop that could be used but it seems like there could be a function in numpy that I'm missing - I'd rather not use the masked arrays that numpy offers
You can use [] to access a range of elements:
import numpy as np
a = np.ones((10))
print(a) # Original array
# [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
startindex = 2
endindex = 4
a[startindex:endindex] = 0
print(a) # modified array
# [ 1. 1. 0. 0. 1. 1. 1. 1. 1. 1.]

Simple implementation of NumPy cov (covariance) function

I was trying to implement the numpy.cov() function as given here: numpy cov (covariance) function, what exactly does it compute?, but I am getting some bizarre results. Please correct me:
import numpy as np
def my_covar(X):
X -= X.mean(axis=0)
N = X.shape[1]
return np.dot(X, X.T.conj())/float(N-1)
X = np.asarray([[1.0,1.0],[2.0,2.0],[3.0,3.0]])
## Run NumPy's implementation
print np.cov(X)
"""
NumPy's output:
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
"""
## Run my implementation
print my_covar(X)
"""
My output:
[[ 2. 0. -2.]
[ 0. 0. 0.]
[ -2. 0. 2.]]
"""
What is going wrong?
Both your function and np.cov (by default) assume that the rows of X correspond to variables, and the columns correspond to observations.
When you center X by subtracting the mean, you need to compute the mean over observations, i.e. the columns of X rather than the rows:
X -= X.mean(axis=1)[:, None]

Categories

Resources