Access columns and rows of numpy.ndarray - python

I currently struggling with extracting certain columns and rows from a matrix stored as a numpy.ndarray.
I have a list in which I've appended these numpy.ndarrays.
This list is stored in a variable named data
print data[0].shape
outputs this
(400, 288)
Which I've according to the documentation have understood being the matrix has 400 rows, and 288 columns.
How do I extract all the 288 seperately?
Example:
>> import numpy as np
>> data = np.random.rand(3,3)
>> print data
[[ 0.97522481 0.57583658 0.68582806]
[ 0.88509883 0.22261933 0.84307038]
[ 0.59397925 0.51592125 0.54346909]]
How do I print the columns separately of this 3x3 matrix, first being
[0.97522481 , 0.88509883, 0.59397925 ]
without outputting the others?

Is it what you are looking for?
import numpy as np
arr = np.array([[1, 2],
[3, 4],
[5, 6]])
print(arr.shape)
# (3, 2)
print(list(data.T))
# [array([1, 3, 5]), array([2, 4, 6])]

Related

Numpy python - calculating sum of columns from irregular dimension

I have a multi-dimensional array for scores, and for which, I need to get sum of each columns at 3rd level in Python. I am using Numpy to achieve this.
import numpy as np
Data is something like:
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
This should return:
[[3 8 8] [5 3 8]]
Which is happening correctly using this:
sum_array = np_array.sum(axis=0)
print(sum_array)
However, if I have irregular shape like this:
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
I expect it to return:
[[3 8] [5 3 8]]
However, it comes up with warning and the return value is:
[list([1, 1, 2, 7]) list([1, 2, 5, 4, 1, 3])]
How can I get expected result?
numpy will try to cast it into an nd array which will fail, instead consider passing each sublist individually using zip.
score_list = [
[[1,1], [1,2,5]],
[[2,7], [4,1,3]]
]
import numpy as np
res = [np.sum(x,axis=0) for x in zip(*score_list)]
print(res)
[array([3, 8]), array([5, 3, 8])]
Here is one solution for doing this, keep in mind that it doesn't use numpy and will be very inefficient for larger matrices (but for smaller matrices runs just fine).
# Create matrix
score_list = [
[[1,1,3], [1,2,5]],
[[2,7,5], [4,1,3]]
]
# Get each row
for i in range(1, len(score_list)):
# Get each list within the row
for j in range(len(score_list[i])):
# Get each value in each list
for k in range(len(score_list[i][j])):
# Add current value to the same index
# on the first row
score_list[0][j][k] += score_list[i][j][k]
print(score_list[0])
There is bound to be a better solution but this is a temporary fix for you :)
Edit. Made more efficient
A possible solution:
a = np.vstack([np.array(score_list[x], dtype='object')
for x in range(len(score_list))])
[np.add(*[x for x in a[:, i]]) for i in range(a.shape[1])]
Another possible solution:
a = sum(score_list, [])
b = [a[x] for x in range(0,len(a),2)]
c = [a[x] for x in range(1,len(a),2)]
[np.add(x[0], x[1]) for x in [b, c]]
Output:
[array([3, 8]), array([5, 3, 8])]

indexing a matrix from a vector array

I have two images, one is a RGB image and the other is a mask image that contains 0 and 1 to segment a specified object. (both images are of the same object)
I want to extract the RBG values of the initial image only at the indexes where the second matrix is 1, so that the final value is an image of just the object with a black background.
is there a simple way to achieve this in numpy?
I would like to solve this problem without using too many for loops, I think there should be a straight forward way in numpy but I have not had any luck so far
You can use numpys built-in broadcasting, and then just straight-out multiply the two in "pythonic" form.
import numpy as np
img = np.array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]]) # shape (2, 2, 3)
mask = np.array([[0,1],[0,1]]) # shape (2, 2)
masked_img = img * np.expand_dims(mask, -1)
Alternatively, you can expand the mask dimensions via np.newaxis:
masked_img = img * mask[..., np.newaxis]
Yes.
You can use numpy.multiply, from here.
For example
img = np.array([
[[1,2],[3,4]],
[[5,6],[7,8]],
[[9,10],[11,12]]
])
mask = np.array(
[[0,1],[0,1]]
)
print(np.array([
np.multiply(img[0],mask),
np.multiply(img[1],mask),
np.multiply(img[2],mask)]
))
# Res:
#[[[ 0 2]
# [ 0 4]]
#
# [[ 0 6]
# [ 0 8]]
#
# [[ 0 10]
# [ 0 12]]]

Pandas series to array conversion is getting me arrays of array objects

I have a Pandas series and here are two first two rows:
X.head(2)
Which has 1D arrays for each row: the column header is mels_flatten
mels_flatten
0 [0.0171469795289, 0.0173154008662, 0.395695541...
1 [0.0471267533454, 0.0061760868171, 0.005647608...
I want to store the values in a single array to feed to a classifier model.
np.vstack(X.values)
or
np.array(X.values)
both returns following
array([[ array([ 1.71469795e-02, 1.73154009e-02, 3.95695542e-01, ...,
2.35955651e-04, 8.64118460e-04, 7.74663408e-04])],
[ array([ 0.04712675, 0.00617609, 0.00564761, ..., 0.00277199,
0.00205229, 0.00043118])],
I am not sure how to process array of array objects.
My expected result is:
array([[ 1.71469795e-02, 1.73154009e-02, 3.95695542e-01, ...,
2.35955651e-04, 8.64118460e-04, 7.74663408e-04]],
[ 0.04712675, 0.00617609, 0.00564761, ..., 0.00277199,
0.00205229, 0.00043118]],
Have tried np.concatenate and np.resize as some other posts suggested with no luck.
I find it likely that not all of your 1d arrays are the same length, i.e. your series is not compatible with a rectangular 2d array.
Consider the following dummy example:
import pandas as pd
import numpy as np
X = pd.Series([np.array([1,2,3]),np.array([4,5,6])])
# 0 [1, 2, 3]
# 1 [4, 5, 6]
# dtype: object
np.vstack(X.values)
# array([[1, 2, 3],
# [4, 5, 6]])
As the above demonstrate, a collection of 1d arrays (or lists) of the same size will be nicely stacked to a 2d array. Check the size of your arrays, and you'll probably find that there are some discrepancies:
>>> X.apply(len)
0 3
1 3
dtype: int64
If X.apply(len).unique() returns an array with more than 1 elements, you'll see the proof of the problem. In the above rectangular case:
>>> X.apply(len).unique()
array([3])
In a non-conforming example:
>>> Y = pd.Series([np.array([1,2,3]),np.array([4,5])])
>>> np.array(Y.values)
array([array([1, 2, 3]), array([4, 5])], dtype=object)
>>> Y.apply(len).unique()
array([3, 2])
As you can see, the nested array result is coupled to the non-unique length of items inside the original array.

Make numpy matrix with insufficient length of data

I have some data, say a list of 10 numbers and I have to convert that list to a matrix of shape (3,4). What would be the best way to do so, if I say I wanted the data to fill by columns/rows and the unfilled spots to have some default value like -1.
Eg:
data = [0,4,1,3,2,5,9,6,7,8]
>>> output
array([[ 0, 4, 1, 3],
[ 2, 5, 9, 6],
[ 7, 8, -1, -1]])
What I thought of doing is
data += [-1]*(row*col - len(data))
output = np.array(data).reshape((row, col))
Is there a simpler method that allows me to achieve the same result without having to modify the original data or sending in data + [-1]*remaining to the np.array function?
I'm sure there are various ways of doing this. My first inclination is to make a output array filled with the 'fill', and copy the data to it. Since the fill is 'ragged', not a full column or row, I'd start out 1d and reshape to the final shape.
In [730]: row,col = 3,4
In [731]: data = [0,4,1,3,2,5,9,6,7,8]
In [732]: output=np.zeros(row*col,dtype=int)-1
In [733]: output[:len(data)]=data
In [734]: output = output.reshape(3,4)
In [735]: output
Out[735]:
array([[ 0, 4, 1, 3],
[ 2, 5, 9, 6],
[ 7, 8, -1, -1]])
Regardless of whether data starts as a list or a 1d array, it will have to be copied to output. With a change in the total number of characters we can't just reshape it.
This isn't that different from your approach of adding the extra values via [-1]*n.
There is a pad function, but it works on whole columns or rows, and internally is quite complex because it's written for general cases.
Use np.ndarray.flat to index into the flattened version of the array.
data = [0, 4, 1, 3, 2, 5, 9, 6, 7, 8]
default_value = -1
desired_shape = (3, 4)
output = default_value * np.ones(desired_shape)
output.flat[:len(data)] = data
# output is now:
# array([[ 0., 4., 1., 3.],
# [ 2., 5., 9., 6.],
# [ 7., 8., -1., -1.]])
As hpaulj says, the extra copy is really hard to avoid.
If you are reading data from a file somehow, you could read it into the flattened array directly, either using flat, or by reshaping the array afterward. Then the data gets directly loaded into the array with the desired shape.
I checked the solutions given based on speed. The tests was done using IPython 4.2.0 with Python 3.5.2|Anaconda 4.1.1 (64-bit).
The data array starts with 100,000 elements. The new dimentions are 150,000 x 150,000.
M. Klugerford's solution (icreasing the data and reshaping):
%timeit data = [x for x in range(100000)]; col=15000; row=15000; data+= [-1]*(row*col-len(data)); output = np.array(data).reshape((row, col))
1 loop, best of 3: 38.8 s per loop
Psidom's solution (using np.pad):
%timeit import numpy as np; data = [x for x in range(100000)]; col=15000; row=15000; np.pad(data, (0, row * col - len(data)), 'constant', constant_values = -1).reshape(row, col)
1 loop, best of 3: 20.4 s per loop
Praveen's solution (using np.ndarray.flat):
%timeit import numpy as np; data = [x for x in range(100000)]; col=15000; row=15000; output = -1 * np.ones((col, row)); output.flat[:len(data)] = data
1 loop, best of 3: 12.2 s per loop
hpaulj's solution (create output first; coping later and best solution so far!!):
%timeit import numpy as np; data = [x for x in range(100000)]; col=15000; row=15000; output=np.zeros(row*col,dtype=int)-1; output[:len(data)]=data; output = output.reshape(col, row)
1 loop, best of 3: 6.28 s per loop
Here is one option using numpy.pad, pad the data with -1 at the end of array and then reshape it:
import numpy as np
data = [0,4,1,3,2,5,9,6,7,8]
row, col = 3, 4
np.pad(data, (0, row * col - len(data)), 'constant', constant_values = -1).reshape(row, col)
# array([[ 0, 4, 1, 3],
# [ 2, 5, 9, 6],
# [ 7, 8, -1, -1]])

(Python--numpy) how to resize and slice an numpy array with out a loop?

So say I have this 2d numpy array:
(
[
[1,2,3,4],
[5,6,7,8],
[9,8,7,6],
[5,4,3,2]
]
);
I'd like to sub-sample this and get 2 by 2 like this (indexing every other row and every other column):
(
[
[1,3],
[9,7]
]
)
Is there a way to do this without any loops?
Thank you!
Yes you can use indexing with steps (in your example step would be 2):
import numpy as np
a = np.array([[1,2,3,4], [5,6,7,8], [9,8,7,6], [5,4,3,2]])
a[::2, ::2]
returns
array([[1, 3],
[9, 7]])
The syntax here is [dim1_start:dim1_stop:dim1_step, dim2_start:dim2_stop:dim2_step]

Categories

Resources