How to pad an array with rows - python

I have a set of numpy arrays with different number of rows and I would like to pad them to a fixed number of rows, e.g.
An array "a" with 3 rows:
a = [
[1.1, 2.1, 3.1]
[1.2, 2.2, 3.2]
[1.3, 2.3, 3.3]
]
I would like to convert "a" to an array with 5 rows:
[
[1.1, 2.1, 3.1]
[1.2, 2.2, 3.2]
[1.3, 2.3, 3.3]
[0, 0, 0]
[0, 0, 0]
]
I have tried np.concatenate((a, np.zeros(3)*(5-len(a))), axis=0), but it does not work.
Any help would be appreciated.

You're looking for np.pad. To zero pad you must set mode to constant and the pad_width that you want on the edges of each axis:
np.pad(a, pad_width=((0,2),(0,0)), mode='constant')
array([[1.1, 2.1, 3.1],
[1.2, 2.2, 3.2],
[1.3, 2.3, 3.3],
[0. , 0. , 0. ],
[0. , 0. , 0. ]])

Related

Using np.arange to create list of coordinate pairs

I am trying to make a program faster and I found this post and I want to implement a solution that resembles the fourth case given in that question.
Here is the relevant part of the code I am using:
count = 0
hist_dat = np.zeros(r**2)
points = np.zeros((r**2, 2))
for a in range(r):
for b in range(r):
for i in range(N):
for j in range(N):
hist_dat[count] += retval(a/r, (a+1)/r, data_a[i][j])*retval(b/r, (b+1)/r, data_b[i][j])/N
points[count][0], points[count][1] = (a+0.5)/r, (b+0.5)/r
count += 1
What this code does is generate the values of a normalized 2D histogram (with "r" divisions in each direction) and the coordinates for those values as numpy.ndarray.
As you can see in the other question linked, I am currently using the second worst possible solution and it takes several minutes to run.
For starters I want to change what the code is doing for the points array (I think that once I can see how that is done I could figure something out for hist_dat). Which is basically this:
In the particular case I am working on, both A and B are the same. So for example, it could be like going from array([0, 0.5, 1]) to array([[0,0], [0,0.5], [0,1], [0.5,0], [0.5,0.5], [0.5,1], [1,0], [1,0.5], [1,1]])
Is there any method for numpy.ndarray or an operation with the np.arange() that does what the above diagram shows without requiring for loops?
Or is there any alternative that can do this as fast as what the linked post showed for the np.arange()?
You can use np.c_ to combine the result of np.repeat and np.tile:
import numpy as np
start = 0.5
end = 5.5
step = 1.0
points = np.arange(start, end, step) # [0.5, 1.5, 2.5, 3.5, 4.5]
output = np.c_[np.repeat(points, n_elements), np.tile(points, n_elements)]
print(output)
Output:
[[0.5 0.5]
[0.5 1.5]
[0.5 2.5]
[0.5 3.5]
[0.5 4.5]
[1.5 0.5]
[1.5 1.5]
[1.5 2.5]
[1.5 3.5]
[1.5 4.5]
[2.5 0.5]
[2.5 1.5]
[2.5 2.5]
[2.5 3.5]
[2.5 4.5]
[3.5 0.5]
[3.5 1.5]
[3.5 2.5]
[3.5 3.5]
[3.5 4.5]
[4.5 0.5]
[4.5 1.5]
[4.5 2.5]
[4.5 3.5]
[4.5 4.5]]
maybe np.mgird would help?
import numpy as np
np.mgrid[0:2:.5,0:2:.5].reshape(2,4**2).T
Output:
array([[0. , 0. ],
[0. , 0.5],
[0. , 1. ],
[0. , 1.5],
[0.5, 0. ],
[0.5, 0.5],
[0.5, 1. ],
[0.5, 1.5],
[1. , 0. ],
[1. , 0.5],
[1. , 1. ],
[1. , 1.5],
[1.5, 0. ],
[1.5, 0.5],
[1.5, 1. ],
[1.5, 1.5]])

Why is my array not initialising? Numpy - error

I am trying to initialize an array with another after making changes to it.
Using Numpy library function on python working on default pydataset
import numpy as np
from pydataset import data
iris_data=data('iris')
iris_arr=iris_data.values
sp_l = iris_arr[:,0] #sepal.length
sp_w = iris_arr[:,1] #sepal.width
sp_l = np.array(sp_l)
sp_w = np.array(sp_w)
if(sp_l.any() <= 5 and sp_w.any() <= 3):
sp_le = np.asarray(sp_l)
sp_we = np.asarray(sp_w)
NameError: name 'sp_le' is not defined
I expected sp_le to be initialized
I think the only problem is the condition expression. The data you are using may not be able to pass the condition. So when you use the sp_le below, it is not initialized. If you could give out the value of sp_l and sp_w and check if it is good.And also as what hpaulj posted, if you want to tell if sp_l has elements smaller than 5, it is better to use (sp_l <=5).any()
I can load the iris dataset from sklearn with:
In [317]: from sklearn.datasets import load_iris
In [321]: arr = load_iris().data
In [322]: arr.shape
Out[322]: (150, 4)
The result is 2d array; first 5 rows are:
In [323]: arr[:5,:]
Out[323]:
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2]])
first and second columns are:
In [324]: sp_l = arr[:,0]
In [325]: sp_w = arr[:,1]
In [326]: sp_l.shape
Out[326]: (150,)
sp_l.any() just tests if any values are not 0. I don't think you want that.
sp_l<=5 tests if values of sp_l are less than or equal to 5
In [327]: (sp_l<=5).any()
Out[327]: True # at least some are
In [328]: (sp_l<=5).sum()
Out[328]: 32 # there are 32 true values in that test
In [329]: (sp_w<=3).sum()
Out[329]: 83 # and 83 sp_w values are small enough.
It's unclear what you want, but one possibility is that you want the rows where sp_l is 5 or less and sp_w is 3 or less.
In [330]: (sp_l<=5)&(sp_w<=3) # the () and & are important
Out[330]:
array([False, True, False, False, False, False, False, False, True,
False, ... False])
In [331]: ((sp_l<=5)&(sp_w<=3)).sum()
Out[331]: 12
We get the indices of those rows with where:
In [332]: idx = np.where(((sp_l<=5)&(sp_w<=3)))
In [333]: idx
Out[333]: (array([ 1, 8, 12, 13, 25, 38, 41, 45, 57, 60, 93, 106]),)
and the actual rows:
In [334]: arr[idx[0]]
Out[334]:
array([[4.9, 3. , 1.4, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.8, 3. , 1.4, 0.1],
[4.3, 3. , 1.1, 0.1],
[5. , 3. , 1.6, 0.2],
[4.4, 3. , 1.3, 0.2],
[4.5, 2.3, 1.3, 0.3],
[4.8, 3. , 1.4, 0.3],
[4.9, 2.4, 3.3, 1. ],
[5. , 2. , 3.5, 1. ],
[5. , 2.3, 3.3, 1. ],
[4.9, 2.5, 4.5, 1.7]])

Get average column value from list of arrays Python

I'm trying to get the average value of values in column 1 and column 2 of a list of arrays. I am using a dict called clusters with an index of clusterNo where I iterate through clusterNo.
print(kMeans.clusters[clusterNo])
When I print the dictionary it gives me this result:
[array([ 5.1, 3.5]), array([ 4.9, 3. ]), array([ 4.7, 3.2]), array([ 4.6, 3.1]), array([ 5. , 3.6])
etc etc..
I cannot figure out how to slice into columns and then get the average. Bare in mind they are float values so I cannot simply avg() them.
Setup
>>> import numpy as np
>>> lst = [np.array([ 5.1, 3.5]), np.array([ 4.9, 3. ]), np.array([ 4.7, 3.2]), np.array([ 4.6, 3.1]), np.array([ 5. , 3.6])]
Solution
>>> np.mean(lst, axis=0)
array([4.86, 3.28])
However, having lst as an array might be advantageous if you need to do more calculations or array operations on that data.
>>> arr = np.array(lst)
>>> arr
array([[5.1, 3.5],
[4.9, 3. ],
[4.7, 3.2],
[4.6, 3.1],
[5. , 3.6]])
>>> arr.mean(axis=0)
array([4.86, 3.28])

Numpy - custom sort of rows and columns in array

Can I sort the rows or columns of an array according to values stored in a separate list?
For example:
row_keys = [10, 11, 5, 6]
z = np.array([[2.77, 11., 4.1, 7.2],
[3.7, 2.2, 1.1, 0.5],
[2.5, 3.5, 5.0, 9.0],
[4.3, 2.2, 5.1, 6.1]])
Should produce something like
array([[ 2.5, 3.5, 5. , 9. ],
[ 4.3, 2.2, 5.1, 6.1]
[ 2.77, 11. , 4.1, 7.2],
[ 3.7, 2.2, 1.1, 0.5],
])
And similar functionality applied to the columns, please.
Another way for rows
z_rows = z[np.argsort(row_keys)]
and for columns
z_columns = z.T[np.argsort(row_keys)].T

The meaning of the comma inside X[:,0]

If X is an array, what is the meaning of X[:,0]? In fact, it is not the first time I see such thing, and it's confusing me, but I can't see what is its meaning? Could anyone be able to show me an example? A full clear answer would be appreciated on this question of comma.
Please see the file https://github.com/lazyprogrammer/machine_learning_examples/blob/master/ann_class/forwardprop.py
The comma inside the bricks seperates the rows from the columns you want to slide from your array.
x[row,column]
You can place ":" before or after the row and column values. Before the value it means "unitl" and after the value it means "from".
For example you have:
x: array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2]])
x[:,:] would mean u want every row and every column.
x[3,3] would mean u want the 3 row and the 3 column value
x[:3,:3] would mean u want the rows and columns until 3
x[:, 3] would mean u want the 3 column and every row
>>> x = [1, 2, 3]
>>> x[:, 0] Traceback (most recent call last):
File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not tuple
If you see that, then the variable is not a list, but something else. A numpy array, perhaps.
I am creating an example matrix:
import numpy as np
np.random.seed(0)
F = np.random.randint(2,5, size=(3, 4), dtype = 'int32' )
F
Query cutting matrix rows:
F[0:2]
Query cutting matrix columns:
F[:,2]
to be straight at point it is X[rows, columns] as some one mentioned but you may ask wat just colon means : in "X[:,0]" it means you say list all.
So X[:,0] - > would say list elements in all rows as it just colon : present in first column so the column of entire matrix is printed out. dimension is [no_of_rows * 1]
Similarly, X[:,1] - > this would list the second column from all rows.
Hope this clarifies you
Pretty clear. Check this out!
Load some data
from sklearn import datasets
iris = datasets.load_iris()
samples = iris.data
Explore first 10 elements of 2D array
samples[:10]
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1]])
Test our annotation
x = samples[:,0]
x[:10]
array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9])
y = samples[:,1]
y[:10]
array([3.5, 3. , 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1])
P.S. The length of samples is 150, I've cut it to 10 for clarity.

Categories

Resources