I need convert many array to one matrix. One array must become one column i use np.column_stack but dos't work for me
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
[1 0 0 ... 0 0 1]
to this
[1 1 1 1 1 1
0 0 0 0 0 0
0 0 0 0 0 0
. . . . . .
. . . . . .
. . . . . .
0 0 0 0 0 0
0 0 0 0 0 0
1 1 1 1 1 1 ]
So you have a list of arrays:
In [3]: alist = [np.array([1,0,0,1]) for i in range(3)]
In [4]: alist
Out[4]: [array([1, 0, 0, 1]), array([1, 0, 0, 1]), array([1, 0, 0, 1])]
Join them to become rows of a 2d array:
In [5]: np.vstack(alist)
Out[5]:
array([[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 0, 0, 1]])
to become columns:
In [6]: np.column_stack(alist)
Out[6]:
array([[1, 1, 1],
[0, 0, 0],
[0, 0, 0],
[1, 1, 1]])
You comment code is unclear, but:
for i in range(6):
np.column_stack((arrays[i]))
doesn't make sense, nor does it follow column_stack docs. column_stack makes a new array; it does not operate in-place. List append does operate inplace, and is a good choice when building a list iteratively, but it should not be taken as a model for building arrays itertively.
All the concatenate and stack functions takes a list of arrays as input. Take advantage of that. And remember, they return a new array on each call. (that applies for np.append as well, but I discourage using that).
Another option in the stack family:
In [7]: np.stack(alist, axis=1)
Out[7]:
array([[1, 1, 1],
[0, 0, 0],
[0, 0, 0],
[1, 1, 1]])
I would put all the arrays in one list and then reshape it
import numpy as np
l=[[1,1,1,0,1,1],[1,0,0,1,0,1]]
l=np.reshape(l,len(l)*len(l[0]),1)
Since what you want is basically vertical stacking of 1D arrays, it makes sense to use np.vstack and then transpose the result using .T:
my_array = np.array([1,0,0,0,0,0,1])
result = np.vstack([my_array] * 6).T
Here I assume you just copy the 1D array 6 times, but alternatively you can pass a list of 1D arrays as an argument to np.vstack.
You can use numpy.asmatrix like the following below. The last steps convert the matrix to a one column matrix like requested.
EDIT
As hpaulj pointed out, np.array (ndarray) is typically used more now, but if you are using a matrix type, the solution below works for this example.
import numpy as np
a1 = [ 1, 2, 3, 4, 5]
a2 = [ 6, 7, 8, 9, 10]
a3 = [11, 12, 13, 14, 15]
mat = np.asmatrix([a1, a2, a3])
mat
## matrix([[ 1, 2, 3, 4, 5],
## [ 6, 7, 8, 9, 10],
## [11, 12, 13, 14, 15]])
mat.shape
## (3, 5)
### If you want to reshape the final matrix
mat2 = mat.reshape(1, 15)
mat2.shape
## (1, 15)
### Convert to 1 column: You can also transpose it.
mat2.transpose().shape
## (15, 1)
Related
I'm trying to replace values in specific columns with zero with python, and the column numbers are specified in another array.
Given the following 2 numpy arrays
a = np.array([[ 1, 2, 3, 4],
[ 1, 2, 1, 2],
[ 0, 3, 2, 2]])
and
b = np.array([1,3])
b indicates column numbers in array "a" where values need to be replaced with zero.
So the expected output is
([[ 1, 0, 3, 0],
[ 1, 0, 1, 0],
[ 0, 0, 2, 0]])
Any ideas on how I can accomplish this? Thanks.
Your question is:
I'm trying to replace values in specific columns with zero with python, and the column numbers are specified in another array.
This can be done like this:
a[:,b] = 0
Output:
[[1 0 3 0]
[1 0 1 0]
[0 0 2 0]]
The Integer array indexing section of Indexing on ndarrays in the numpy docs has some similar examples.
A simple for loop will accomplish this.
for column in b:
for row in range(len(a)):
a[row][column] = 0
print(a)
[[1 0 3 0]
[1 0 1 0]
[0 0 2 0]]
I have two arrays
arr1 = np.array([[4, 1, 3, 2, 5], [5, 2, 4, 1, 3]])
arr2 = np.array([[2], [1]])
I want to transform array 1 to a binary array using the elements of the array 2 in the following way
For row 1 of array 1, I want to use the row 1 of array 2 i.e. 2 - to make the top 2 values of array 1 as 1s and the rest as 0s
Similarly for row 2 of array 1, I want to use the row 2 of array 2 i.e. 1 - to make the top 1 value of array 1 as 1s and the rest as 0s
So arr1 would get transformed as follows
arr1_transformed = np.array([[1, 0, 0, 0, 1], [1, 0, 0, 0, 0]])
Here is what I tried.
arr1_sorted_indices = np.argosrt(-arr1)
This gave me the indices of the sorted array
array([[1, 3, 2, 0, 4],
[3, 1, 4, 2, 0]])
Now I think I need to mask this array with the help of arr2 to get the desired output and I'm not sure how to do it.
this should do the job in the mentioned case:
def trasform_arr(arr1,arr2):
for i in range(0,len(arr1)):
if i >= len(arr2):
arr1[i] = [0 for x in arr1[i]]
else:
sorted_arr = sorted(arr1[i])[-arr2[i][0]:]
arr1[i] = [1 if x in sorted_arr else 0 for x in arr1[i]]
arr1 = [[4, 1, 3, 2, 5], [5, 2, 4, 1, 3]]
arr2 = [[2], [1]]
trasform_arr(arr1,arr2)
print(arr1)
You can try the following:
import numpy as np
arr1 = np.array([[4, 1, 3, 2, 5], [5, 2, 4, 1, 3]])
arr2 = np.array([[2],[1]])
r, c = arr1.shape
s = np.argsort(np.argsort(-arr1))
out = (np.arange(c) < arr2)[np.c_[0:r], s] * 1
print(out)
It gives:
[[1 0 0 0 1]
[1 0 0 0 0]]
I am quite new to python and have read lots of SO questions on this topic however none of them answers my needs.
I end up with an ndarray:
[[1, 2, 3]
[4, 5, 6]]
Now I want to pad each element (e.g. [1, 2, 3]) with a tailored padding just for that element. Of course I could do it in a for loop and append each result to a new ndarray but isn't there a faster and cleaner way I could apply this over the whole ndarray at once?
I imagined it could work like:
myArray = [[1, 2, 3]
[4, 5, 6]]
paddings = [(1, 2),
(2, 1)]
myArray = np.pad(myArray, paddings, 'constant')
But of course this just outputs:
[[0 0 0 0 0 0 0 0 0]
[0 0 1 2 3 0 0 0 0]
[0 0 3 4 5 0 0 0 0]
[0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0]]
Which is not what i need. The target result would be:
[[0 1 2 3 0 0]
[0 0 4 5 6 0]]
How can I achieve this using numpy?
Here is a loop based solution but with creating a zeros array as per the dimensions of input array and paddings. Explanation in comments:
In [192]: myArray
Out[192]:
array([[1, 2, 3],
[4, 5, 6]])
In [193]: paddings
Out[193]:
array([[1, 2],
[2, 1]])
# calculate desired shape; needed for initializing `padded_arr`
In [194]: target_shape = (myArray.shape[0], myArray.shape[1] + paddings.shape[1] + 1)
In [195]: padded_arr = np.zeros(target_shape, dtype=np.int32)
In [196]: padded_arr
Out[196]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=int32)
After this, we can use a for loop to slot fill the sequences from myArray, based on the values from paddings:
In [199]: for idx in range(paddings.shape[0]):
...: padded_arr[idx, paddings[idx, 0]:-paddings[idx, 1]] = myArray[idx]
...:
In [200]: padded_arr
Out[200]:
array([[0, 1, 2, 3, 0, 0],
[0, 0, 4, 5, 6, 0]], dtype=int32)
The reason we've to resort to a loop based solution is because numpy.lib.pad() doesn't yet support this sort of padding, even with all available additional modes and keyword arguments that it already provides.
*Question edited/updated to add an example
Hi all! I have this a np.array. Based on the reference values of it, I want to update array b, which is my matrix. The "1st column" of a represents a code and the "2nd column" is my reference value. The matrix is populated with codes and I must replace them. See below the example.
import numpy as np
a = np.asarray([[0, 11], [1, 22], [2, 33]])
b = np.asarray([[0, 14, 12, 2], [1, 1, 7, 0], [0, 0,3,5], [1, 2, 2, 6]])
In other words: I want to replace the 0, 1, 2 values in "b" by 11, 22, 33, respectively.
Which is the best way to do that, considering that my real a array has +- 50 codes and my real b matrices have a shape of (850,850).
Thanks in advance!
If I understand the question correctly, this example should show what you're asking for?
Assuming a is the matrix as you've listed above, and b is the list you want to write to
import numpy as np
a = np.asarray([[0, 10], [2, 30], [1, 40]])
b = np.zeros(3)
b[a[:, 0]] = a[:, 1]
where the [:, 0] is the index to be changed, and [:, 1] is what to populate it with
If codes are not too long integers, You just have to build the correct lookup table :
lut = np.arange(b.max()+1)
k,v = a.T
lut[k] = v
For :
>>> b
[[ 0 14 12 2]
[ 1 1 7 0]
[ 0 0 3 5]
[ 1 2 2 6]]
>>> lut[b]
[[11 14 12 33]
[22 22 7 11]
[11 11 3 5]
[22 33 33 6]]
undefined codes are mapped to themselves,code=value.
I have a big csr_matrix and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. (Context: The matrix is a document-term matrix obtained from sklearn CountVectorizer and I want to be able to quickly combine documents according to codes associated with these documents)
For a minimal example, this is my matrix:
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import vstack
row = np.array([0, 4, 1, 3, 2])
col = np.array([0, 2, 2, 0, 1])
dat = np.array([1, 2, 3, 4, 5])
A = csr_matrix((dat, (row, col)), shape=(5, 5))
print A.toarray()
[[1 0 0 0 0]
[0 0 3 0 0]
[0 5 0 0 0]
[4 0 0 0 0]
[0 0 2 0 0]]
No let's say I want a new matrix B in which rows (1, 4) and (2, 3, 5) are combined by summing them, which would look something like this:
[[5 0 0 0 0]
[0 5 5 0 0]]
And should be again in sparse format (because the real data I'm working with is large). I tried to sum over slices of the matrix and then stack it:
idx1 = [1, 4]
idx2 = [2, 3, 5]
A_sub1 = A[idx1, :].sum(axis=1)
A_sub2 = A[idx2, :].sum(axis=1)
B = vstack((A_sub1, A_sub2))
But this gives me the summed up values just for the non-zero columns in the slice, so I can't combine it with the other slices because the number of columns in the summed slices are different.
I feel like there must be an easy way to do this. But I couldn't find any discussion of this online or in the documentation. What am I missing?
Thank you for your help
Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:
>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
>>>
The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:
col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()
Output:
<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
[0 5 5 0 0]]
You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.
The indexing should be:
idx1 = [0, 3] # rows 1 and 4
idx2 = [1, 2, 4] # rows 2,3 and 5
Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:
A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so #Mr_E's answer is probably better.
Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):
A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))
Giving:
matrix([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])