Get average column value from list of arrays Python - python

I'm trying to get the average value of values in column 1 and column 2 of a list of arrays. I am using a dict called clusters with an index of clusterNo where I iterate through clusterNo.
print(kMeans.clusters[clusterNo])
When I print the dictionary it gives me this result:
[array([ 5.1, 3.5]), array([ 4.9, 3. ]), array([ 4.7, 3.2]), array([ 4.6, 3.1]), array([ 5. , 3.6])
etc etc..
I cannot figure out how to slice into columns and then get the average. Bare in mind they are float values so I cannot simply avg() them.

Setup
>>> import numpy as np
>>> lst = [np.array([ 5.1, 3.5]), np.array([ 4.9, 3. ]), np.array([ 4.7, 3.2]), np.array([ 4.6, 3.1]), np.array([ 5. , 3.6])]
Solution
>>> np.mean(lst, axis=0)
array([4.86, 3.28])
However, having lst as an array might be advantageous if you need to do more calculations or array operations on that data.
>>> arr = np.array(lst)
>>> arr
array([[5.1, 3.5],
[4.9, 3. ],
[4.7, 3.2],
[4.6, 3.1],
[5. , 3.6]])
>>> arr.mean(axis=0)
array([4.86, 3.28])

Related

How to pad an array with rows

I have a set of numpy arrays with different number of rows and I would like to pad them to a fixed number of rows, e.g.
An array "a" with 3 rows:
a = [
[1.1, 2.1, 3.1]
[1.2, 2.2, 3.2]
[1.3, 2.3, 3.3]
]
I would like to convert "a" to an array with 5 rows:
[
[1.1, 2.1, 3.1]
[1.2, 2.2, 3.2]
[1.3, 2.3, 3.3]
[0, 0, 0]
[0, 0, 0]
]
I have tried np.concatenate((a, np.zeros(3)*(5-len(a))), axis=0), but it does not work.
Any help would be appreciated.
You're looking for np.pad. To zero pad you must set mode to constant and the pad_width that you want on the edges of each axis:
np.pad(a, pad_width=((0,2),(0,0)), mode='constant')
array([[1.1, 2.1, 3.1],
[1.2, 2.2, 3.2],
[1.3, 2.3, 3.3],
[0. , 0. , 0. ],
[0. , 0. , 0. ]])

Why is my array not initialising? Numpy - error

I am trying to initialize an array with another after making changes to it.
Using Numpy library function on python working on default pydataset
import numpy as np
from pydataset import data
iris_data=data('iris')
iris_arr=iris_data.values
sp_l = iris_arr[:,0] #sepal.length
sp_w = iris_arr[:,1] #sepal.width
sp_l = np.array(sp_l)
sp_w = np.array(sp_w)
if(sp_l.any() <= 5 and sp_w.any() <= 3):
sp_le = np.asarray(sp_l)
sp_we = np.asarray(sp_w)
NameError: name 'sp_le' is not defined
I expected sp_le to be initialized
I think the only problem is the condition expression. The data you are using may not be able to pass the condition. So when you use the sp_le below, it is not initialized. If you could give out the value of sp_l and sp_w and check if it is good.And also as what hpaulj posted, if you want to tell if sp_l has elements smaller than 5, it is better to use (sp_l <=5).any()
I can load the iris dataset from sklearn with:
In [317]: from sklearn.datasets import load_iris
In [321]: arr = load_iris().data
In [322]: arr.shape
Out[322]: (150, 4)
The result is 2d array; first 5 rows are:
In [323]: arr[:5,:]
Out[323]:
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2]])
first and second columns are:
In [324]: sp_l = arr[:,0]
In [325]: sp_w = arr[:,1]
In [326]: sp_l.shape
Out[326]: (150,)
sp_l.any() just tests if any values are not 0. I don't think you want that.
sp_l<=5 tests if values of sp_l are less than or equal to 5
In [327]: (sp_l<=5).any()
Out[327]: True # at least some are
In [328]: (sp_l<=5).sum()
Out[328]: 32 # there are 32 true values in that test
In [329]: (sp_w<=3).sum()
Out[329]: 83 # and 83 sp_w values are small enough.
It's unclear what you want, but one possibility is that you want the rows where sp_l is 5 or less and sp_w is 3 or less.
In [330]: (sp_l<=5)&(sp_w<=3) # the () and & are important
Out[330]:
array([False, True, False, False, False, False, False, False, True,
False, ... False])
In [331]: ((sp_l<=5)&(sp_w<=3)).sum()
Out[331]: 12
We get the indices of those rows with where:
In [332]: idx = np.where(((sp_l<=5)&(sp_w<=3)))
In [333]: idx
Out[333]: (array([ 1, 8, 12, 13, 25, 38, 41, 45, 57, 60, 93, 106]),)
and the actual rows:
In [334]: arr[idx[0]]
Out[334]:
array([[4.9, 3. , 1.4, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.8, 3. , 1.4, 0.1],
[4.3, 3. , 1.1, 0.1],
[5. , 3. , 1.6, 0.2],
[4.4, 3. , 1.3, 0.2],
[4.5, 2.3, 1.3, 0.3],
[4.8, 3. , 1.4, 0.3],
[4.9, 2.4, 3.3, 1. ],
[5. , 2. , 3.5, 1. ],
[5. , 2.3, 3.3, 1. ],
[4.9, 2.5, 4.5, 1.7]])

Numpy - custom sort of rows and columns in array

Can I sort the rows or columns of an array according to values stored in a separate list?
For example:
row_keys = [10, 11, 5, 6]
z = np.array([[2.77, 11., 4.1, 7.2],
[3.7, 2.2, 1.1, 0.5],
[2.5, 3.5, 5.0, 9.0],
[4.3, 2.2, 5.1, 6.1]])
Should produce something like
array([[ 2.5, 3.5, 5. , 9. ],
[ 4.3, 2.2, 5.1, 6.1]
[ 2.77, 11. , 4.1, 7.2],
[ 3.7, 2.2, 1.1, 0.5],
])
And similar functionality applied to the columns, please.
Another way for rows
z_rows = z[np.argsort(row_keys)]
and for columns
z_columns = z.T[np.argsort(row_keys)].T

Sorting arrays in Python by a non-integer column

So for example I have an array that I want to sort by a column in an ascending order, and it's easy to do for integers using 'sorting()', 'np.arrange()', or 'np.argsort()'.
However, what if my column is consisting of floats?
What would you recommend?
Edit:
I mean, I have something like:
a = array([[1.7, 2, 3],
[4.5, 5, 6],
[0.1, 0, 1]])
and I want to get this:
array([[0.1, 0, 1],
[1.7, 2, 3],
[4.5, 5, 6]])
So far with argsort() I get the following error:
Type Error: only integer scalar arrays can be converted to a scalar index
You can use a standard Python's sorted (or sort for in-place sorting), no matter what is contained in the sequence. Just use a custom key, or a custom compare function (cmp). For example, to sort a list of lists (2-d array) ascending by 4th column:
>>> a=[[1.0,2.0,3.0,4.0], [4.0,3.0,2.0,1.0], [0,0,0,0]]
>>> from operator import itemgetter
>>>> sorted(a, key=itemgetter(3))
[[0, 0, 0, 0], [4.0, 3.0, 2.0, 1.0], [1.0, 2.0, 3.0, 4.0]]
The standard way to do this in numpy is to specify the correct axis you want to sort on, by default it sorts on axis=-1:
>>> np.sort(a, axis=0)
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])
Or inplace:
>>> a.sort(axis=0)
>>> a
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])
To sort just on a specific column you can use argsort(), e.g. column 0:
>>> a[np.argsort(a[:,0])]
array([[ 0.1, 0. , 1. ],
[ 1.7, 2. , 3. ],
[ 4.5, 5. , 6. ]])

The meaning of the comma inside X[:,0]

If X is an array, what is the meaning of X[:,0]? In fact, it is not the first time I see such thing, and it's confusing me, but I can't see what is its meaning? Could anyone be able to show me an example? A full clear answer would be appreciated on this question of comma.
Please see the file https://github.com/lazyprogrammer/machine_learning_examples/blob/master/ann_class/forwardprop.py
The comma inside the bricks seperates the rows from the columns you want to slide from your array.
x[row,column]
You can place ":" before or after the row and column values. Before the value it means "unitl" and after the value it means "from".
For example you have:
x: array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2]])
x[:,:] would mean u want every row and every column.
x[3,3] would mean u want the 3 row and the 3 column value
x[:3,:3] would mean u want the rows and columns until 3
x[:, 3] would mean u want the 3 column and every row
>>> x = [1, 2, 3]
>>> x[:, 0] Traceback (most recent call last):
File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not tuple
If you see that, then the variable is not a list, but something else. A numpy array, perhaps.
I am creating an example matrix:
import numpy as np
np.random.seed(0)
F = np.random.randint(2,5, size=(3, 4), dtype = 'int32' )
F
Query cutting matrix rows:
F[0:2]
Query cutting matrix columns:
F[:,2]
to be straight at point it is X[rows, columns] as some one mentioned but you may ask wat just colon means : in "X[:,0]" it means you say list all.
So X[:,0] - > would say list elements in all rows as it just colon : present in first column so the column of entire matrix is printed out. dimension is [no_of_rows * 1]
Similarly, X[:,1] - > this would list the second column from all rows.
Hope this clarifies you
Pretty clear. Check this out!
Load some data
from sklearn import datasets
iris = datasets.load_iris()
samples = iris.data
Explore first 10 elements of 2D array
samples[:10]
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1]])
Test our annotation
x = samples[:,0]
x[:10]
array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9])
y = samples[:,1]
y[:10]
array([3.5, 3. , 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1])
P.S. The length of samples is 150, I've cut it to 10 for clarity.

Categories

Resources