How to shuffle a matrix and an array accordingly

How to shuffle a matrix and an array accordingly - python

Suppose I have an mXd matrix called X, and an mX1 array called Y (using numpy). The rows of X correspond to the rows of Y.
Now suppose I need to shuffle the data (the rows) in X. I used:
random.shuffle(X)
Is there a way for me to keep track of the way X has been shuffled, so I could shuffle Y accordingly?
Thank you :)

You can use numpy.random.permutation to create a permuted list of indices, and then shuffle both X and Yusing those indices:
>>> import numpy
>>> m = 10
>>> X = numpy.random.rand(m, m)
>>> Y = numpy.random.rand(m)
>>> indices = numpy.random.permutation(m)
>>> indices
array([4, 7, 6, 9, 0, 3, 1, 2, 8, 5])
>>> Y
array([ 0.53867012, 0.6700051 , 0.06199551, 0.51248468, 0.4990566 ,
0.81435935, 0.16030748, 0.96252029, 0.44897724, 0.98062564])
>>> Y = Y[indices]
>>> Y
array([ 0.4990566 , 0.96252029, 0.16030748, 0.98062564, 0.53867012,
0.51248468, 0.6700051 , 0.06199551, 0.44897724, 0.81435935])
>>> X = X[indices, :]

Related

Numpy python - calculating sum of columns from irregular dimension

I have a multi-dimensional array for scores, and for which, I need to get sum of each columns at 3rd level in Python. I am using Numpy to achieve this.
import numpy as np
Data is something like:
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
This should return:
[[3 8 8] [5 3 8]]
Which is happening correctly using this:
sum_array = np_array.sum(axis=0)
print(sum_array)
However, if I have irregular shape like this:
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
I expect it to return:
[[3 8] [5 3 8]]
However, it comes up with warning and the return value is:
[list([1, 1, 2, 7]) list([1, 2, 5, 4, 1, 3])]
How can I get expected result?

numpy will try to cast it into an nd array which will fail, instead consider passing each sublist individually using zip.
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
import numpy as np
res = [np.sum(x,axis=0) for x in zip(*score_list)]
print(res)
[array([3, 8]), array([5, 3, 8])]

Here is one solution for doing this, keep in mind that it doesn't use numpy and will be very inefficient for larger matrices (but for smaller matrices runs just fine).
# Create matrix
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
# Get each row
for i in range(1, len(score_list)):
# Get each list within the row
for j in range(len(score_list[i])):
# Get each value in each list
for k in range(len(score_list[i][j])):
# Add current value to the same index
# on the first row
score_list[0][j][k] += score_list[i][j][k]
print(score_list[0])
There is bound to be a better solution but this is a temporary fix for you :)
Edit. Made more efficient

A possible solution:
a = np.vstack([np.array(score_list[x], dtype='object')
for x in range(len(score_list))])
[np.add(*[x for x in a[:, i]]) for i in range(a.shape[1])]
Another possible solution:
a = sum(score_list, [])
b = [a[x] for x in range(0,len(a),2)]
c = [a[x] for x in range(1,len(a),2)]
[np.add(x[0], x[1]) for x in [b, c]]
Output:
[array([3, 8]), array([5, 3, 8])]

Python numpy array values get rounded after boolean indexing

I want to apply calculation only for those values that are higher than threshold. After doing it with boolean indexing, values get rounded. How to prevent it?
starting_score = 1
threshold = 5
x = np.array([0,1,2,3,4,5,6,7,8,9,10])
gt_idx = x > threshold
le_idx = x <= threshold
decay = math.log(2) / 10
y = starting_score * np.exp(-decay * x)
x[gt_idx] = starting_score * np.exp(-decay * x[gt_idx])
y
array([1. , 0.93303299, 0.87055056, 0.8122524 , 0.75785828,
0.70710678, 0.65975396, 0.61557221, 0.57434918, 0.53588673,
0.5 ])
x
array([0, 1, 2, 3, 4, 5, 0, 0, 0, 0, 0])
when applied to full array, I get correct y array.
when applied to part of x, values get selected properly, but rounded to 0
My expected output is
array([0, 1, 2, 3, 4, 5, 0.65975396, 0.61557221, 0.57434918, 0.53588673, 0.5])

It is considered np.int32 as default type for when you create a NumPy array with integers as x. For getting other types in the results you have two ways:
# np.float32 or np.float64
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=np.float64) # way 1
x = x.astype(np.float64) # way 2
such operation is not needed for y because in is multiplied by a float type value i.e. np.exp(-decay * x), so it became to float types.

numpy automatically assigns the integer data type to x. To preserve your floats you need to change the type of the x array
x.dtype
# Out: dtype('int64')
x = x.astype('float64')
or declare x as an array of float64
x = np.array([0,1,2,3,4,5,6,7,8,9,10], dtype='float64')

Numpy polyfit returns an array, while I expect a single number

I have two lists, x and `y' based on which I want to fit a linear line.
To do so, I use the following code:
x = df[quality][df['model'].str.contains(cluster, case=False, na=False)].to_numpy()
y = df[prediction][df['model'].str.contains(cluster, case=False, na=False)].to_numpy()
slope, constant = np.polyfit(x, y, 1)
I expected the slope and the constant to be one number (since I am using 1 degree in np.polyfit. But instead, it's a NumPy array:
print(slope)
>>> [ 1.07032587 -0.07121294]
print(constant)
>>> [0.13656049 0.08582967]
How can I interpret these numbers? And which values can I use to fit a line?

This can happen because your x and y have different dimension.
>>> import numpy as np
>>> x = np.array([1, 3, 5, 7])
>>> y = np.array([[1,6],[3, 3], [5,9], [7,5]])
>>> slope, constant = np.polyfit(x, y, 1)
>>> slope
array([1. , 0.15])
>>> constant
array([2.66453526e-15, 5.15000000e+00])
To make sure they both have same dimension,
>>> assert x.ndim == y.ndim
Correct way should be ,
>>> x = np.array([1, 3, 5, 7])
>>> y = np.array([ 6, 3, 9, 5 ])
>>> assert x.ndim == y.ndim
>>> slope, constant = np.polyfit(x, y, 1)
>>> slope
0.14999999999999963
>>> constant
5.150000000000002

How do I overwrite a row vector in a numpy array?

I am trying to normalize each row vector of numpy array x, but I'm facing 2 problems.
I'm unable to update the row vectors of x (source code in image)
Is it possible to avoid the for loop (line 6) with any numpy functions?
import numpy as np
x = np.array([[0, 3, 4] , [1, 6, 4]])
c = x ** 2
for i in range(0, len(x)):
print(x[i]/np.sqrt(c[i].sum())) #prints [0. 0.6 0.8]
x[i] = x[i]/np.sqrt(c[i].sum())
print(x[i]) #prints [0 0 0]
print(x) #prints [[0 0 0] [0 0 0]] and wasn't updated
I've just recently started out with numpy, so any assistance would be greatly appreciated!

I'm unable to update the row vectors of x (source code in image)
Your np.array has no dtype argument, so it uses <type 'numpy.int32'>. If you wish to store floats in the array, add a float dtype:
x = np.array([
[0,3,4],
[1,6,4]
], dtype = np.float)
To see this, compare
x = np.array([
[0,3,4],
[1,6,4]
], dtype = np.float)
print type(x[0][0]) # output = <type 'numpy.float64'>
to
x = np.array([
[0,3,4],
[1,6,4]
])
print type(x[0][0]) # output = <type 'numpy.int32'>
is it possible to avoid the for loop (line 6) with any numpy functions?
This is how I would do it:
norm1, norm2 = np.linalg.norm(x[0]), np.linalg.norm(x[1])
print x[0] / norm1
print x[1] / norm2

You can use:
x/np.sqrt((x*x).sum(axis=1))[:, None]
Example:
In [9]: x = np.array([[0, 3, 4] , [1, 6, 4]])
In [10]: x/np.sqrt((x*x).sum(axis=1))[:, None]
Out[10]:
array([[0. , 0.6 , 0.8 ],
[0.13736056, 0.82416338, 0.54944226]])

For the first question:
x = np.array([[0,3,4],[1,6,4]],dtype=np.float32)
For the second question:
x/np.sqrt(np.sum(x**2,axis=1).reshape((len(x),1)))

Given 2-dimensional array
x = np.array([[0, 3, 4] , [1, 6, 4]])
Row-wise L2 norm of that array can be calculated with:
norm = np.linalg.norm(x, axis = 1)
print(norm)
[5. 7.28010989]
You can not divide array x of shape (2, 3) by norm of shape (2,), the following trick enables that by adding extra dimension to norm
# Divide by adding extra dimension
x = x / norm[:, None]
print(x)
[[0. 0.6 0.8 ]
[0.13736056 0.82416338 0.54944226]]
This solves both your questions

How to multiply element by element between matrices in Python?

Let's assume I have 2 matrices which each of them represents vector:
X = np.matrix([[1],[2],[3]])
Y = np.matrix([[4],[5],[6]])
I want the output to be the result of multiplying it element by element, which means it should be:
[[4],[10],[18]]
Note that it is np.matrix and not np.array

Tested np.multiply() on ipython and it worked like a charm
In [41]: X = np.matrix([[1],[2],[3]])
In [42]: Y = np.matrix([[4],[5],[6]])
In [43]: np.multiply(X, Y)
Out[43]:
matrix([[ 4],
[10],
[18]])

so remember that NumPy matrix is a subclass of NumPy array, and array operations are element-wise.
therefore, you can convert your matrices to NumPy arrays, then multiply them with the "*" operator, which will be element-wise:
>>> import numpy as NP
>>> X = NP.matrix([[1],[2],[3]])
>>> Y = NP.matrix([[4],[5],[6]])
>>> X1 = NP.array(X)
>>> Y1 = NP.array(Y)
>>> XY1 = X1 * Y1
array([[ 4],
[10],
[18]])
>>> XY = matrix(XY1)
>>> XY
matrix([[ 4],
[10],
[18]])
alternatively you can use a generic function for element-wise multiplication:
>>> a = NP.matrix("4 5 7; 9 3 2; 3 9 1")
>>> b = NP.matrix("5 2 9; 8 4 2; 1 7 4")
>>> ab = NP.multiply(a, b)
>>> ab
matrix([[20, 10, 63],
[72, 12, 4],
[ 3, 63, 4]])
these two differ in the return type and so you probably want to choose the first if the next function in your data flow requires a NumPy array; if it requires a NumPy matrix, then the second

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to shuffle a matrix and an array accordingly - python

Related

Numpy python - calculating sum of columns from irregular dimension

Python numpy array values get rounded after boolean indexing

Numpy polyfit returns an array, while I expect a single number

How do I overwrite a row vector in a numpy array?

How to multiply element by element between matrices in Python?

Categories

Resources