Related
I am using Python and looking to iterate through each row of an Nx9 array and extract certain values from the row to form another matrix with them. The N value can change depending on the file I am reading but I have used N=3 in my example. I only require the 0th, 1st, 3rd and 4th values of each row to form into an array which I need to store. E.g:
result = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[11, 12, 13, 14, 15, 16, 17, 18, 19],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
#Output matrix of first row should be: ([[1,2],[4,5]])
#Output matrix of second row should be: ([[11,12],[14,15]])
#Output matrix of third row should be: ([[21,22],[24,25]])
I should then end up with N number of matrices formed with the extracted values - a 2D matrix for each row. However, the matrices formed appear 3D so when transposed and subtracted I receive the error ValueError: operands could not be broadcast together with shapes (2,2,3) (3,2,2). I am aware that a (3,2,2) matrix cannot be subtracted from a (2,2,3) so how do I obtain a 2D matrix N number of times? Would a loop be better suited? Any suggestions?
import numpy as np
result = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[11, 12, 13, 14, 15, 16, 17, 18, 19],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
a = result[:, 0]
b = result[:, 1]
c = result[:, 2]
d = result[:, 3]
e = result[:, 4]
f = result[:, 5]
g = result[:, 6]
h = result[:, 7]
i = result[:, 8]
output = [[a, b], [d, e]]
output = np.array(output)
output_transpose = output.transpose()
result = 0.5 * (output - output_transpose)
In [276]: result = np.array(
...: [
...: [1, 2, 3, 4, 5, 6, 7, 8, 9],
...: [11, 12, 13, 14, 15, 16, 17, 18, 19],
...: [21, 22, 23, 24, 25, 26, 27, 28, 29],
...: ]
...: )
...:
...: a = result[:, 0]
...
...: i = result[:, 8]
...: output = [[a, b], [d, e]]
In [277]: output
Out[277]:
[[array([ 1, 11, 21]), array([ 2, 12, 22])],
[array([ 4, 14, 24]), array([ 5, 15, 25])]]
In [278]: arr = np.array(output)
In [279]: arr
Out[279]:
array([[[ 1, 11, 21],
[ 2, 12, 22]],
[[ 4, 14, 24],
[ 5, 15, 25]]])
In [280]: arr.shape
Out[280]: (2, 2, 3)
In [281]: arr.T.shape
Out[281]: (3, 2, 2)
transpose exchanges the 1st and last dimensions.
A cleaner way to make a (N,2,2) array from selected columns is:
In [282]: arr = result[:,[0,1,3,4]].reshape(3,2,2)
In [283]: arr.shape
Out[283]: (3, 2, 2)
In [284]: arr
Out[284]:
array([[[ 1, 2],
[ 4, 5]],
[[11, 12],
[14, 15]],
[[21, 22],
[24, 25]]])
Since the last 2 dimensions are 2, you could transpose them, and take the difference:
In [285]: arr-arr.transpose(0,2,1)
Out[285]:
array([[[ 0, -2],
[ 2, 0]],
[[ 0, -2],
[ 2, 0]],
[[ 0, -2],
[ 2, 0]]])
Another way to get the (N,2,2) array is with a matrix index:
In [286]: result[:,[[0,1],[3,4]]]
Out[286]:
array([[[ 1, 2],
[ 4, 5]],
[[11, 12],
[14, 15]],
[[21, 22],
[24, 25]]])
Ok, this is not a coding problem, but a math problem. I wrote some code for you, since it's pretty obvious you're a beginner, so there will be some unfamiliar syntax in there that you should look into so you can avoid problems like this in the future. You might not use them all that often, but it's good to know how to do it, because it expands your understanding of python syntax in general.
First up, the complete code for easy copy and pasting:
import numpy as np
result=np.array([[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29]])
output = np.array(tuple(result[:,i] for i in (0,1,3)))
def Matrix_Operation(Matrix,Coefficient):
if (Matrix.shape == Matrix.shape[::-1]
and isinstance(Matrix,np.ndarray)
and isinstance(Coefficient,float)):
return Coefficient*(Matrix-Matrix.transpose())
else:
print('The shape of you Matrix is not palindromic')
print('You cannot substitute matrices of unequal shape')
print('Your shape: %s'%str(Matrix.shape))
print(Matrix_Operation(output,0.5))
Now let's talk about a step by step explanation of what's happening here:
import numpy as np
result=np.array([[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29]])
Python uses indentation (alignment of whitespaces) as an integral part of it's syntax. However, if you provide brackets, a lot of the time you don't need aligning indentations in order for the interpreter to understand your code. If you provide a large array of values manually, it is usually adviseable to start new lines at the commas (here, the commas separating the sublists). It's just more readable and that way your data isn't off screen in your coding program.
output = np.array(tuple(result[:,i] for i in (0,1,3)))
List comprehensions are a big deal in python and really handy for dirty one liners. As far as I know, no other language gives you this option. That's one of the reasons why python is so great. I basically created a list of lists, where each sublist is result[:,i] for every i in (0,1,3). This is cast as a tuple (yes, list comprehensions can also be done with tuples, not just lists). Finally I wrapped it in the np.array function, since this is the type required for our mathematical operations later on.
def Matrix_Operation(Matrix,Coefficient):
if (Matrix.shape == Matrix.shape[::-1]
and isinstance(Matrix,np.ndarray)
and isinstance(Coefficient,(float,int))):
return Coefficient*(Matrix-Matrix.transpose())
else:
print('The shape of you Matrix is not palindromic')
print('You cannot substitute matrices of unequal shape')
print('Your shape: %s'%str(Matrix.shape))
print(Matrix_Operation(output,0.5))
If you're gonna create a complex formula in python code, why not wrap it inside an abstractable function? You can incorporate a lot of "quality control" into a function as well, to check if it is given the correct input for the task it is supposed to do.
Your code failed, because you were trying to subtract a (2,2,3) shaped matrix from a (3,2,2) matrix. So we'll need a code snippet to check, if our provided matrix has a palindromic shape. You can reverse the order of items in a container by doing Container[::-1] and so we ask, if Matrix.shape == Matrix.shape[::-1]. Further, it is necessary, that our Matrix is a np.ndarray and if our coefficient is a number. That's what I'm doing with the isinstance() function. You can check for multiple types at once, which is why the isinstance(Coefficient,(float,int)) contains a tuple with both int and float in it.
Now that we have ensured that our input makes sense, we can preform our Matrix_Operation.
So in conclusion: Check if your math is solid before asking SO for help, because people here can get pretty annoyed at that sort of thing. You probably noticed by now that someone has already downvoted your question. Personally, I believe it's necessary to let newbies ask a couple stupid questions before they get into the groove, but that's what the voting button is for, I guess.
In the following program, I am trying to understand how np.concatenate command works. After accessing each row of the array a by for loop, when I concatenate along row axis I expect a 2-dimensional array having the shape of (5,5) but it changes.
I want to have the same dimension (5,5) after concatenation. How can I do that?
I tried to repeat the above method for the 2-dimensional array by storing them in a list [(2,5),(2,5),(2,5)]. At the end when I concatenate it gives me the shape of (6,5) as expected but in the following case, it is different.
a = np.arange(25).reshape(5,5)
ind =[0,1,2,3,4]
list=[]
for i in ind:
list.append(a[i])
new= np.concatenate(list, axis=0)
print(list)
print(len(list))
print(new)
print(new.shape)
This gives the following results for new:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
and for new.shape:
(25,)
To preface this you really should not be using concatenate here.
Setup
a = np.arange(25).reshape(5,5)
L = [i for i in a]
You're question asks:
Why is np.concatenate changing dimension?
It's not changing dimension, it is doing exactly what it is supposed to based on the input you are giving it. From the documentation:
Join a sequence of arrays along an existing axis
When you pass your list to concatenate, don't think of it as passing a (5, 5) list, think of it as passing 5 (5,) shape arrays, which are getting joined along axis 0, which will intuitively produce a (25,) shape output.
Now this behavior also gives insight on how to work around this. If passing 5 (5,) shape arrays produces a (25,) shape output, we just need to pass (1, 5) shape arrays to produce a (5, 5) shape output. We can accomplish this by simply adding a dimension to each element of L:
np.concatenate([[i] for i in L])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
However, the much better way to approach this is to simply use stack, vstack, etc..
>>> np.stack(L)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> np.vstack(L)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
I am currently monkey-patching a Scikit-Learn function and one of the lines requires a NumPy array with 2 dimensions. However, the data I am working with is a NumPy array with 3 dimensions, which raises the error "too many values to unpack."
I am calling the K-Means function fit to cluster the data. My problem boils down to this following line of code, assuming X is the ndarray that I pass in:
n_samples, n_features = X.shape
X is an array with 3 dimensions, like the following:
X = np.array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18]]])
The data represents a group of time series of data points that have 6 dimensions. For example, the first element, [[1, 2, 3], [4, 5, 6]] would represent a time series with samples from 2 time periods, each sample with 3 dimensions.
And I have monkey-patched the k_means_ code to allow me to perform clustering on an ndarray of ndarrays. My goal is to perform k-means on 2D arrays.
Is it possible to set the shape of the 3D ndarray to 2 elements? For example, I tried converting the 3D array to a 2D array of objects but it ends up getting converted back to a 3D array.
np.array([[x.astype(object) for x in c] for c in combined])
Likewise, the following code is also converted back to a 3D array.
np.array([[np.array(x) for x in c] for c in combined])
The list comprehension [[x.astype(object) for x in c] for c in combined] looks like it creates the correct array, but because it is of type list, it no longer works in the function.
I am looking for some way to "convert" a 3D NumPy array into 2 dimensions. Any help would be greatly appreciated!
Note: I am not looking for a way to reshape the array. I need to keep all the dimensions but change the shape to ignore one of the dimensions.
To make an array of arrays, we have to play some tricks, because np.array tries to make an as-high dimensional array as it can. If the subarrays vary in size that is ok, but if they are all the same we have to fight that.
Here's one way:
start with a 3d array:
In [812]: arr = np.arange(24).reshape(2,3,4)
and an empty object array of the right size (but flattened)
In [813]: A = np.empty((6,),object)
copy values (again with flattening), and reshape to the target shape
In [814]: A[:]=list(arr.reshape(-1,4))
In [815]: A=A.reshape(2,3)
In [816]: A
Out[816]:
array([[array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])],
[array([12, 13, 14, 15]), array([16, 17, 18, 19]),
array([20, 21, 22, 23])]], dtype=object)
So now we have a (2,3) array, who's shape can be unpacked.
I tried to start with np.empty((2,3),object), but couldn't get the A[:,:]=... assignment to work. For this object reshaping to work we have to split arr into a list of arrays. An object array is, like a list, an array of pointers.
But will the scikit functions accept such an array? (after passing the shape hurdle). I suspect the object reshaping is a short sighted solution.
In [824]: [[x.astype(object) for x in c] for c in arr]
Out[824]:
[[array([0, 1, 2, 3], dtype=object),
array([4, 5, 6, 7], dtype=object),
array([8, 9, 10, 11], dtype=object)],
[array([12, 13, 14, 15], dtype=object),
array([16, 17, 18, 19], dtype=object),
array([20, 21, 22, 23], dtype=object)]]
In [825]: _[0][0].shape
Out[825]: (4,)
This creates a nested list of lists, with the inner elements being (4,) object array. Wrap that in np.array and it recreates a 3d array with dtype object.
reshaping, which for some unknown reason, you don't want to do preserves the numeric dtype
In [828]: arr.reshape(2,-1)
Out[828]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
In [829]: arr.reshape(-1,4)
Out[829]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
I am trying to copy a section of an input 2d array "img" and mirroring that section and copying it into the 2d array "out"
The following code does what I need
a = numpy.zeros(shape=(pad, pad))
a[:,:]=img[0:pad,0:pad]
out[0:pad,0:pad]=a[::-1,::-1]
But simply doing the following does not
out[0:pad,0:pad]=img[0:pad:-1,0:pad:-1]
and instead returnsValueError: could not broadcast input array from shape (0,0) into shape (2,2) for pad=2 and I am not sure why.
img[0:pad:-1,0:pad:-1]
should be
img[pad-1::-1, pad-1::-1]
since you want the index to start at pad-1 and step down to 0. See here for the complete rules governing NumPy basic slicing.
For example,
import numpy as np
img = np.arange(24).reshape(6,4)
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [16, 17, 18, 19],
# [20, 21, 22, 23]])
pad = 2
out = img[pad-1::-1, pad-1::-1]
print(out)
yields
[[5 4]
[1 0]]
How can I efficiently assign to a row in a csr_matrix?
This gives the error:
Q[mid, :] = new_Q
Q is a csr_matrix, and new_Q is the result of Q.getrow(i).
I'm using the latest version of scipy.
Am I using the right matrix type?
I want to find the right matrix type for two matrices I'm using: Q and B.
I'm modifying one row of matrix Q at a time, and modifying one column of B at a time. It seems like I should create Q as a lil_matrix or a csr_matrix. What type of matrix should B be? A csc_matrix?
It seems to work fine in 13.2:
>>> sp.__version__
'0.13.2'
>>> import scipy.sparse as sps
>>> a = sps.csr_matrix(np.arange(25).reshape(5, 5))
>>> a.A
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> a[3] = a.getrow(0)
>>> a.A
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[ 0, 1, 2, 3, 4],
[20, 21, 22, 23, 24]])
As for the answer to your second question, it depends on whether you are changing the sparsity pattern of the rows (columns) or not. If you are not, then yes, CSR for row, CSC for columns. But if you are, then CSR and CSC are going to struggle, because every row (column) update is going to require copying the data for the whole matrix every time. In this latter case you may want to try to use LIL for both your Q and B matrices, but work with B.T so that you are accessing rows.