Numpy - custom sort of rows and columns in array - python

Can I sort the rows or columns of an array according to values stored in a separate list?
For example:
row_keys = [10, 11, 5, 6]
z = np.array([[2.77, 11., 4.1, 7.2],
[3.7, 2.2, 1.1, 0.5],
[2.5, 3.5, 5.0, 9.0],
[4.3, 2.2, 5.1, 6.1]])
Should produce something like
array([[ 2.5, 3.5, 5. , 9. ],
[ 4.3, 2.2, 5.1, 6.1]
[ 2.77, 11. , 4.1, 7.2],
[ 3.7, 2.2, 1.1, 0.5],
])
And similar functionality applied to the columns, please.

Another way for rows
z_rows = z[np.argsort(row_keys)]
and for columns
z_columns = z.T[np.argsort(row_keys)].T

Related

How to combine these two numpy arrays?

How would I combine these two arrays:
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
Into something like this:
xy = [[0.1, [1.0, 1.1, 1.2, 1.3]], [0.2, [2.0, 2.1, 2.2, 2.3]...
Thank you for the assistance!
Someone suggested I post code that I have tried and I realized I had forgot to:
xy = np.array(list(zip(x, y)))
This is my current solution, however it is extremely inefficient.
You can use zip to combine
[[a,b] for a,b in zip(y,x)]
Out:
[[array([0.1]), array([1. , 1.1, 1.2, 1.3])],
[array([0.2]), array([2. , 2.1, 2.2, 2.3])],
[array([0.3]), array([3. , 3.1, 3.2, 3.3])],
[array([0.4]), array([4. , 4.1, 4.2, 4.3])],
[array([0.5]), array([5. , 5.1, 5.2, 5.3])]]
A pure numpy solution will be much faster than list comprehension for large arrays.
I do have to say your use case makes no sense, as there is no logic in putting these arrays into a single data structure, and I believe you should re check your design.
Like #user2357112 supports Monica was subtly implying, this is very likely an XY problem. See if this is really what you are trying to solve, and not something else. If you want something else, try asking about that.
I strongly suggest checking what you want to do before moving on, as you will put yourself in a place with bad design.
That aside, here's a solution
import numpy as np
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
xy = np.hstack([y, x])
print(xy)
prints
[[0.1 1. 1.1 1.2 1.3]
[0.2 2. 2.1 2.2 2.3]
[0.3 3. 3.1 3.2 3.3]
[0.4 4. 4.1 4.2 4.3]
[0.5 5. 5.1 5.2 5.3]]

How to pad an array with rows

I have a set of numpy arrays with different number of rows and I would like to pad them to a fixed number of rows, e.g.
An array "a" with 3 rows:
a = [
[1.1, 2.1, 3.1]
[1.2, 2.2, 3.2]
[1.3, 2.3, 3.3]
]
I would like to convert "a" to an array with 5 rows:
[
[1.1, 2.1, 3.1]
[1.2, 2.2, 3.2]
[1.3, 2.3, 3.3]
[0, 0, 0]
[0, 0, 0]
]
I have tried np.concatenate((a, np.zeros(3)*(5-len(a))), axis=0), but it does not work.
Any help would be appreciated.
You're looking for np.pad. To zero pad you must set mode to constant and the pad_width that you want on the edges of each axis:
np.pad(a, pad_width=((0,2),(0,0)), mode='constant')
array([[1.1, 2.1, 3.1],
[1.2, 2.2, 3.2],
[1.3, 2.3, 3.3],
[0. , 0. , 0. ],
[0. , 0. , 0. ]])

Get average column value from list of arrays Python

I'm trying to get the average value of values in column 1 and column 2 of a list of arrays. I am using a dict called clusters with an index of clusterNo where I iterate through clusterNo.
print(kMeans.clusters[clusterNo])
When I print the dictionary it gives me this result:
[array([ 5.1, 3.5]), array([ 4.9, 3. ]), array([ 4.7, 3.2]), array([ 4.6, 3.1]), array([ 5. , 3.6])
etc etc..
I cannot figure out how to slice into columns and then get the average. Bare in mind they are float values so I cannot simply avg() them.
Setup
>>> import numpy as np
>>> lst = [np.array([ 5.1, 3.5]), np.array([ 4.9, 3. ]), np.array([ 4.7, 3.2]), np.array([ 4.6, 3.1]), np.array([ 5. , 3.6])]
Solution
>>> np.mean(lst, axis=0)
array([4.86, 3.28])
However, having lst as an array might be advantageous if you need to do more calculations or array operations on that data.
>>> arr = np.array(lst)
>>> arr
array([[5.1, 3.5],
[4.9, 3. ],
[4.7, 3.2],
[4.6, 3.1],
[5. , 3.6]])
>>> arr.mean(axis=0)
array([4.86, 3.28])

Creating a numpy array of 3D coordinates from three 1D arrays, first index changing fastest

similar to the question here
I have three arbitrary 1D arrays, for example:
x_p = np.array((0.0,1.1, 2.2, 3.3, 4.4))
y_p = np.array((5.5,6.6,7.7))
z_p = np.array((8.8, 9.9))
I need
points = np.array([[0.0, 5.5, 8.8],
[1.1, 5.5, 8.8],
[2.2, 5.5, 8.8],
...
[4.4, 7.7, 9.9]])
1) with the first index changing fastest.2) points are float coordinates, not integer index.
3) I noticed from version 1.7.0, numpy.meshgrid has changed behavior with default indexing='xy' and need to use
np.vstack(np.meshgrid(x_p,y_p,z_p,indexing='ij')).reshape(3,-1).T
to get the result points with last index changing fast, which is not I want.(It was mentioned only from 1.7.0,meshgrid supports dimension>2, I didn't check)
I found this with some trial and error.
I think the ij v xy indexing has been in meshgrid forever (it's the sparse parameter that's newer). It just affects the order of the 3 returned elements.
To get x_p varying fastest I put it last in the argument list, and then used a ::-1 to reverse column order at the end.
I used stack to join the arrays on a new axis at the end, so I don't need to transpose. But the reshaping and transpose's are all cheap (time wise). So they can be used in any combination that works and is understandable.
In [100]: np.stack(np.meshgrid(z_p, y_p, x_p, indexing='ij'),3).reshape(-1,3)[:,::-1]
Out[100]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
...
[ 2.2, 7.7, 9.9],
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])
You might permute axes with np.transpose to achieve the output in that desired format -
np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Sample output -
In [104]: np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Out[104]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
[ 1.1, 6.6, 8.8],
[ 2.2, 6.6, 8.8],
[ 3.3, 6.6, 8.8],
[ 4.4, 6.6, 8.8],
[ 0. , 7.7, 8.8],
[ 1.1, 7.7, 8.8],
....
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])

The meaning of the comma inside X[:,0]

If X is an array, what is the meaning of X[:,0]? In fact, it is not the first time I see such thing, and it's confusing me, but I can't see what is its meaning? Could anyone be able to show me an example? A full clear answer would be appreciated on this question of comma.
Please see the file https://github.com/lazyprogrammer/machine_learning_examples/blob/master/ann_class/forwardprop.py
The comma inside the bricks seperates the rows from the columns you want to slide from your array.
x[row,column]
You can place ":" before or after the row and column values. Before the value it means "unitl" and after the value it means "from".
For example you have:
x: array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2]])
x[:,:] would mean u want every row and every column.
x[3,3] would mean u want the 3 row and the 3 column value
x[:3,:3] would mean u want the rows and columns until 3
x[:, 3] would mean u want the 3 column and every row
>>> x = [1, 2, 3]
>>> x[:, 0] Traceback (most recent call last):
File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not tuple
If you see that, then the variable is not a list, but something else. A numpy array, perhaps.
I am creating an example matrix:
import numpy as np
np.random.seed(0)
F = np.random.randint(2,5, size=(3, 4), dtype = 'int32' )
F
Query cutting matrix rows:
F[0:2]
Query cutting matrix columns:
F[:,2]
to be straight at point it is X[rows, columns] as some one mentioned but you may ask wat just colon means : in "X[:,0]" it means you say list all.
So X[:,0] - > would say list elements in all rows as it just colon : present in first column so the column of entire matrix is printed out. dimension is [no_of_rows * 1]
Similarly, X[:,1] - > this would list the second column from all rows.
Hope this clarifies you
Pretty clear. Check this out!
Load some data
from sklearn import datasets
iris = datasets.load_iris()
samples = iris.data
Explore first 10 elements of 2D array
samples[:10]
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1]])
Test our annotation
x = samples[:,0]
x[:10]
array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9])
y = samples[:,1]
y[:10]
array([3.5, 3. , 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1])
P.S. The length of samples is 150, I've cut it to 10 for clarity.

Categories

Resources