I want to initialize an empty vector with 3 columns that I can add to. I need to perform some l2 norm distance calculations on the rows after I have added to it, and I'm having the following problem.
I start with an initial empty array:
accepted_clusters = np.array([])
Then I add my first 1x3 set of values to this:
accepted_clusters = np.append(accepted_clusters, X_1)
returning:
[ 0.47843416 0.50829221 0.51484499]
Then I add a second set of 1x3 values in the same way, and I get the following:
[ 0.47843416 0.50829221 0.51484499 0.89505277 0.8359252 0.21434642]
However, what I want is something like this:
[ 0.47843416 0.50829221 0.51484499]
[ 0.89505277 0.8359252 0.21434642]
.. and so on
This would enable me to calculate distances between the rows. Ideally, the initial empty vector would be of undefined length, but something like a 10x3 of zeros would also work if the code for that is easy.
The most straightforward way is to use np.vstack:
In [9]: arr = np.array([1,2,3])
In [10]: x = np.arange(20, 23)
In [11]: arr = np.vstack([arr, x])
In [12]: arr
Out[12]:
array([[ 1, 2, 3],
[20, 21, 22]])
Note, your entire approach has major code smell, doing the above in a loop will give you quadratic complexity. Perhaps you should work with a list and then convert to an array at the end (which will at least be linear-time). Or maybe rethink your approach entirely.
Or, as you imply, you could pre-allocate your array:
In [18]: result = np.zeros((10, 3))
In [19]: result
Out[19]:
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
In [20]: result[0] = x
In [21]: result
Out[21]:
array([[ 20., 21., 22.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
You can try using vstack to add rows.
accepted_clusters=np.vstack([accepted_clusters,(0.89505277, 0.8359252, 0.21434642)])
Related
Lets say I have the following NumPy array matrix:
import numpy as np
A = np.zeros((4,4))
A
Out[7]:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Next, say I have the following index lists:
row_indexes = [0,2,3]
column_indexes = [0,2,1]
and a list of corresponding values:
values = [10, 20, 30]
My question is: How can I insert the list of values into the matrix A as efficiently as possible (computation time is relevant) at the row/column index locations specified by row_indexes and column_indexes so that after this operation, A would equal:
array([[10., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 20., 0.],
[ 0., 30., 0., 0.]])
I have a 60000 by 200 numpy array. I want to make it 60000 by 201 by adding a column of 1's to the right (so every row is [prev, 1]).
Concatenate with axis = 1 doesn't work because it seems like concatenate requires all input arrays to have the same dimension.
How should I do this?
Let me just throw in a very simple example with much smaller size. The principle should be the same.
a = np.zeros((6,2))
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
b = np.ones((6,1))
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
np.hstack((a,b))
array([[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 0., 0., 1.]])
Using numpy index trick to append a 1D vector to a 2D array
a = np.zeros((6,2))
# array([[ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.],
# [ 0., 0.]])
b = np.ones(6) # or np.ones((6,1))
#array([1., 1., 1., 1., 1., 1.])
np.c_[a,b]
# array([[0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.],
# [0., 0., 1.]])
Under cover all the stack variants (including append and insert) end up doing a concatenate. They just precede it with some sort of array reshape.
In [60]: A = np.arange(12).reshape(3,4)
In [61]: np.concatenate([A, np.ones((A.shape[0],1),dtype=A.dtype)], axis=1)
Out[61]:
array([[ 0, 1, 2, 3, 1],
[ 4, 5, 6, 7, 1],
[ 8, 9, 10, 11, 1]])
Here I made a (3,1) array of 1s, to match the (3,4) array. If I wanted to add a new row, I'd make a (1,4) array.
While the variations are handy, if you are learning, you should become familiar with concatenate and the various ways of constructing arrays that match in number of dimensions and necessary shapes.
The first thing to think about is that numpy arrays are really not meant to change size. So you should ask yourself, can you create your original matrix as 60k x 201 and then fill the last column afterwards. This is usually best.
If you really must do this, see
How to add column to numpy array
I think the numpy method column_stack is more interesting because you do not need to create a column numpy array to stack it in the matrix of interest. With the column_stack you just need to create a normal numpy array.
I have a huge numpy ndarray (called mat and of the shape 700000 x 6000) of which I want to sum through the columns and find the nonzero indices.
I want to sum through it like so:
x = np.sum(mat[:,y], axis=1)
indices = np.nonzero(x)
But the first line immediately gives me an instant Memory Error. Is there a way I can go around using np.sum and do it another way that makes this calculation possible?
You have two problems:
See Sven Marnach's comment, it is possible that your data set is too large for your hardware
See ajcr's comment, what you want to do is not feasible the way you try do do it because the notation mat[:,an_index] gives you back an array of dimensionality one, whose only axis is axis=0
Another problem is the nature of your array, if it is an array of floating point numbers the probability that the sum of 700,000 entries is exactly equal to zero is close to zero... it's not impossible of course, but unlikely for certain it is.
That said, if you can reduce your data set or improve your hardware, you can do like this
In [39]: a = np.zeros((10,5))
In [40]: for i in range(5): a[3,i]=1+2*i if i != 3 else 0.0
In [41]: a
Out[41]:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 1., 3., 5., 0., 9.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
In [42]: np.sum(a,axis=0)
Out[42]: array([ 1., 3., 5., 0., 9.])
In [43]: np.nonzero(np.sum(a,axis=0))
Out[43]: (array([0, 1, 2, 4]),)
In [44]:
This question already has answers here:
How do I print the full NumPy array, without truncation?
(22 answers)
Closed 9 years ago.
I am working with image processing in python and I want to output a variable, right now the variable b is a numpy array with shape (200,200). When I do print b all I see is:
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
How do I print out the full contents of this array, write it to a file or something simple so I can just look at the contents in full?
Of course, you can change the print threshold of the array as answered elsewhere with:
np.set_printoptions(threshold=np.nan)
But depending on what you're trying to look at, there's probably a better way to do that. For example, if your array truly is mostly zeros as you've shown, and you want to check whether it has values that are nonzero, you might look at things like:
import numpy as np
import matplotlib.pyplot as plt
In [1]: a = np.zeros((100,100))
In [2]: a
Out[2]:
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
Change some values:
In [3]: a[4:19,5:20] = 1
And it still looks the same:
In [4]: a
Out[4]:
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
Check some things that don't require manually looking at all values:
In [5]: a.sum()
Out[5]: 225.0
In [6]: a.mean()
Out[6]: 0.022499999999999999
Or plot it:
In [7]: plt.imshow(a)
Out[7]: <matplotlib.image.AxesImage at 0x1043d4b50>
Or save to a file:
In [11]: np.savetxt('file.txt', a)
to_print = "\n".join([", ".join(row) for row in b])
print (to_print) #console
f = open("path-to-file", "w")
f.write(to_print) #to file
In case it's numpy array: Print the full numpy array
How could the following MATLAB code be written using NumPy?
A = zeros(5, 100);
x = ones(5,1);
A(:,1) = x;
Assigning to rows seems to work easily, but I couldn't find an example of assigning an array to a column of another array.
Use a[:,1] = x[:,0]. You need x[:,0] to select the column of x as a single numpy array. If you have the choice of how to format x, it's better to not make it a 2-dimensional array in the first place, but just a regular (row) array:
>>> a
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
>>> x = numpy.ones(5)
>>> x
array([ 1., 1., 1., 1., 1.])
>>> a[:,1] = x
>>> a
array([[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.],
[ 0., 1., 0.]])
>>> A = np.zeros((5,100))
>>> x = np.ones((5,1))
>>> A[:,:1] = x