clearing elements of numpy array - python

Is there a simple way to clear all elements of a numpy array? I tried:
del arrayname
This removes the array completely. I am using this array inside a for loop that iterates thousands of times, so I prefer to keep the array but populate it with new elements every time.
I tried numpy.delete, but for my requirement I don't see the use of subarray specification.
*Edited*:
The array size is not going to be the same.
I allocate the space, inside the loop at the beginning, as follows. Please correct me if this is a wrong way to go about:
arrname = arange(x*6).reshape(x,6)
I read a dataset and construct this array for each tuple in the dataset. All I know is the number of columns is going to be the same but not the number of rows. For example, the first time I might need an array of size (3,6), for the next tuple as (1,6) and the next time as (4,6) and so on. The way I populate the array is as follows:
arrname[:,0] = lstname1
arrname[:,1] = lstname2
...
In other words, the columns are filled from lists constructed from the tuples. So, before the next loop begins I want to clear its elements and make it ready for the consecutive loop since I don't want remnants from the previous loop mixing the current contents.

I'm not sure what you mean by clear, the array will always have some values stored in it, but you can set those values to something, for example:
>>> A = numpy.array([[1, 2], [3, 4], [5, 6]], dtype=numpy.float)
>>> A
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
>>> A.fill(0)
>>> A
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> A[:] = 1.
>>> A
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
Update
First, your question is very unclear. The more effort you put into writing a good question the better answers you'll get. A good question should make it clear to us what you're trying to do and why. Also example data is very helpful, just a small amount, so we can see exactly what you're trying to do.
That being said. It seems like maybe you should just create a new array for each iteration. Creating arrays is pretty fast and it's not clear why you would want to reuse an array when the size and contents need to change. If you're trying to reuse it for performance reasons, you're probably not going to see any measurable difference, resizing arrays is not noticeably faster than creating a new array. You can create a new array by calling numpy.zeros((X, 6))
Also in your question you say:
the columns are filled from lists constructed from the tuples
If your data is already housed as a list of tuples you use numpy.array to convert it to an array. You don't need to go the the trouble of creating an array and filling it. For example if I wanted to get a (2, 3) array from a list of tuples I would do:
data = [(0, 0, 1), (0, 0, 2)]
A = numpy.array(data)
# or if the data is stored like this
data = [(0, 0), (0, 0), (1, 2)]
A = numpy.array(data).T
Hope that helps.

With a wag of the finger for possible premature optimization, I will offer some thoughts:
You say you don't want any remnants left over from previous iterations. From your code it looks like you populate each of the new elements column by column for each of the known number of columns. "Left over" values don't look like a problem. consider:
using arange and reshape serves no purpose. use np.empty((n,6)). Faster than ones or zeros by a hair.
you could alternatively construct your new array from the constituents
See:
lstname1 = np.arange(3)
lstname2 = 22*np.arange(3)
np.vstack((lstname1,lstname2)).T
# returns
array([[ 0, 0],
[ 1, 22],
[ 2, 44]])
#or
np.hstack((lstname1[:,np.newaxis],lstname2[:,np.newaxis]))
array([[ 0, 0],
[ 1, 22],
[ 2, 44]])
Lastly, If you are really really concerned about speed, you could allocate the largest expected size (if not known the you could check the requested size vs the last largest and if it is larger then use np.empty((rows,cols)) to increase the size.
Then at each iteration, your create a view of the larger matrix of just the number of rows you want. This will cause numpy to reuse the same buffer space and not need to to any allocation at each of your iterations. Notice:
In [36]: big = np.vstack((lstname1,lstname2)).T
In [37]: smaller = big[:2]
In [38]: smaller[:,1]=33
In [39]: smaller
Out[39]:
array([[ 0, 33],
[ 1, 33]])
In [40]: big
Out[40]:
array([[ 0, 33],
[ 1, 33],
[ 2, 44]])
Note These are suggestions that fit your expanded question with clarification and does not fit your earlier question about "clearing" the array. Even in the latter example you could easily say smaller.fill(0) to allay concerns depending on whether you reliably reassign all elements of the array in your iterations.

If you want to keep the array allocated, and with the same size, you don't need to clear the elements. Simply keep track of where you are, and overwrite the values in the array. This is the most efficient way of doing it.

I would simply begin putting the new values into the array.
But if you insist on clearing out the array, try making a new one of the same size using zeros or empty.
>>> A = numpy.array([[1, 2], [3, 4], [5, 6]])
>>> A
array([[1, 2],
[3, 4],
[5, 6]])
>>> A = numpy.zeros(A.shape)
>>> A
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])

Related

Struggling with numpy libs where()

I somehow got mixed up with primitive AI and came across this code which I am having hard time understanding.
I read some site but none seem to have answer I am looking for. :(
Could anyone explain np.where() function in this scenario?
It occured to me that this line of code makes child_pos an empty 2d array
if curr_node.get_curr_child() == 0
But I am not sure... Glad for every response.
The code in question is:
child_pos = np.where(np.asarray(curr_node.get_curr_child()) == 0)[0][0]
Disregarding your code, np.where returns the positions of the values you are searching for in the where statement.
For example:
Let's assume
matrix = array([[1., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]])
If we were to run np.where(matrix == 0) what we would get is
(array([1, 2], dtype=int64),
array([1, 2], dtype=int64))
Which basically gives you the row/column positions of the value 0 in the original 2-dimensional array. The first array represents the row positions and the second array represents the column positions.
This logic extends to higher/lower dimensions as well.
Returning to your code, you turn the result of get_curr_child into an np array and then you fetch the first value from the first dimension of the np.where result.

Python equivalent to MATLAB's dynamic array initialization [duplicate]

I want to create an empty array and append items to it, one at a time.
xs = []
for item in data:
xs.append(item)
Can I use this list-style notation with NumPy arrays?
That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!
To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]
I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.
For creating an empty NumPy array without defining its shape you can do the following:
arr = np.array([])
The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.
for adding new element to the array us can do:
arr = np.append(arr, 'new element')
Note that in the background for python there's no such thing as an array without
defining its shape. as #hpaulj mentioned this also makes a one-rank
array.
You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.
Here is some workaround to make numpys look more like Lists
np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)
OUTPUT: array([ 2., 24.])
If you absolutely don't know the final size of the array, you can increment the size of the array like this:
my_arr = numpy.zeros((0,5))
for i in range(3):
my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)
[[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]]
Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.
You can apply it to build any kind of array, like zeros:
a = range(5)
a = [i*0 for i in a]
print a
[0, 0, 0, 0, 0]
Depending on what you are using this for, you may need to specify the data type (see 'dtype').
For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):
myarray = numpy.empty(shape=(H,W),dtype='u1')
For an RGB image, include the number of color channels in the shape: shape=(H,W,3)
You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.
Another simple way to create an empty array that can take array is:
import numpy as np
np.empty((2,3), dtype=object)
I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;
ur_list = []
for col in columns:
ur_list.append(list(col))
mat = np.matrix(ur_list)
I think you can create empty numpy array like:
>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)
This format is useful when you want to append numpy array in the loop.
Perhaps what you are looking for is something like this:
x=np.array(0)
In this way you can create an array without any element. It similar than:
x=[]
This way you will be able to append new elements to your array in advance.
The simplest way
Input:
import numpy as np
data = np.zeros((0, 0), dtype=float) # (rows,cols)
data.shape
Output:
(0, 0)
Input:
for i in range(n_files):
data = np.append(data, new_data, axis = 0)

Extract elements of an N-dimensional array within a list in Python

I have the following problem. I save a large amount of data within a class. Most of these data are time dependent and in the most complex cases, the variables are 3-dimensional array.
Because list are quite flexible (no need of explicit declaration), I wanted to use them to encapsulate my N-dimensional arrays and thus, use lists to keep the time dependence information.
Here a typical example of what I have for the element t=0, t=2 and t=3 of my list which is within the history class (a simple matrix of float64):
history.params[0]
array([[ 1. , 2. , 1. ],
[ 1. , 2. , 1. ],
[ 0.04877093, 0.53176887, 0.26210472],
[ 2.76110434, 1.3569246 , 3.118208 ]])
history.params[2]
array([[ 1.00000825, 1.99998362, 1.00012835],
[ 0.62113657, 0.47057772, 5.23074169],
[ 0.04877193, 0.53076887, 0.26210472],
[ 0.02762192, 4.99387138, 2.6654337 ]])
history.params[3]
array([[ 1.00005473, 1.99923187, 1.00008009],
[ 0.62713657, 0.47157772, 5.23074169],
[ 0.04677193, 0.53476887, 0.25210472],
[ 0.02462192, 4.89387138, 2.6654337 ]])
Now, How do I do to read/extract, all elements at given coordinate (x,y) of my matrix, for all the time indexes t?
I tried by doing:
history.params[:][0][0]
and I get
array([ 1., 2., 1.])
Actually whatever the place of the colon, I always get the same values, which correspond to the first row of my matrix:
"history.params[0][:][0]" returns "array([ 1., 2., 1.])" in the shell
"history.params[0][0][:]" returns "array([ 1., 2., 1.])"
Why Python is not able here to distinguish the elements of the matrix, from the elements of the list? What is the best solution?
Of course, I can write some loops and create a new variable that reorganize my data, but it is a bit a waste of energy. I am certain that it exists an elegant solution.
PS: I am going to 'Cythonize' my code at some point, so if you have an optimized solution for Cython to store these variables, I am very happy to hear it as well.
I would suggest using a numpy.array array rather than nested lists.
import numpy as np
# Create some data which would be equal to your "params"
a = np.array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[11, 12, 13],
[14, 15, 16],
[17, 18, 19]]])
print(a[0])
# [[1 2 3]
# [4 5 6]
# [7 8 9]]
print(a[:, 0, 0])
# [1, 11]
print(a[:, 0:2, 0])
# [[1, 4]
# [11, 14]]
Furthermore numpy can be combined with Cython as given here.
Why Python is not able here to distinguish the elements of the matrix, from the elements of the list? What is the best solution?
Because history.params[0] is a list of lists, so history.params[0][0] is a list, so history.params[0][0][:] is list[:] which is a copy of the inner list. Similarly, history.params[0][:] is a copy of the list of lists, so history.params[0][:][0] is (copy of the list of lists)[0] which is again the first row of your inner list, but in the copy of the 2D array.
If you want to flatten your list, ie store a 2D array as one list, each element at (n,m) of 2D array of size NxM becomes element (n*M + m) in your flattened array. So in a 4x3 as you have posted, element (0,0) is element 0 of flat list, element (2,1) is element 2*3+1 = 7, and so forth.
You can extend that to 3D arrays: array of size KxNxM, you would have index (k,n,m) is element (k*N*M + n*N + m); and similarly for higher dimensionalities.

Efficient way to drop a column from a Numpy array?

If I have a very large numpy array with one useless column, how could I drop it without creating a copy of the original array?
np.delete(my_np_array, 0, 1)
The above code will return a copy of the array without the zero-th column. But instead I would like to simply delete that column from my_np_array since I don't need it. For very large datasets, the memory management becomes important and copying may not be an option.
If memory is the main concern, what you can do is move columns around within your array such that the unneeded column gets at the very end of your array, then use ndarray.resize, which modifies he array in-place, to shrink it down and discard the outer column.
You cannot simply remove the first column of an array in-place using the provided API, and I suspect it is because of the memory layout of an ndarray that maps multidimensional indexing to unidimensional byte-oriented addressing within blocks of contiguous memory.
The following example copies the last column into the first and then deletes the last (now unneeded), immediately purging the associated memory. So it basically removes the obsolete column from memory completely, at the cost of changing your column order.
D1, D2 = A.shape
A[:, 0] = A[:, D2-1]
A.resize((D1, D2-1), refcheck=False)
A.shape
# => would be (5, 4) if the shape was initially (5, 5) for example
If you use slicing numpy won't make a copy; in other words
a = numpy.array([1, 2, 3, 4, 5])
b = a[1:] # view elements from second to last, NOT making a copy
b[0] = 12 # Change first element of `b`, i.e. second of `a`
print a
will reply [1, 12, 3, 4, 5]
If you need to delete an element in the middle however a single slicing won't work.
Numpy arrays are immutable. So they can't be re-sized without creating a intermediate copy.
How to remove specific elements in a numpy array
Creating a view with slicing, and make a copy of that is probably the fastest you can do.
In [804]: a = np.ones((2,2))
In [805]: a
Out[805]:
array([[ 1., 1.],
[ 1., 1.]])
In [806]: np.resize(a,(3,2))
Out[806]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
In [807]: a <- a should now be resized if it was done inplace?
Out[807]:
array([[ 1., 1.],
[ 1., 1.]])

How do I create an empty array and then append to it in NumPy?

I want to create an empty array and append items to it, one at a time.
xs = []
for item in data:
xs.append(item)
Can I use this list-style notation with NumPy arrays?
That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!
To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]
I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.
For creating an empty NumPy array without defining its shape you can do the following:
arr = np.array([])
The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.
for adding new element to the array us can do:
arr = np.append(arr, 'new element')
Note that in the background for python there's no such thing as an array without
defining its shape. as #hpaulj mentioned this also makes a one-rank
array.
You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.
Here is some workaround to make numpys look more like Lists
np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)
OUTPUT: array([ 2., 24.])
If you absolutely don't know the final size of the array, you can increment the size of the array like this:
my_arr = numpy.zeros((0,5))
for i in range(3):
my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)
[[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]]
Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.
You can apply it to build any kind of array, like zeros:
a = range(5)
a = [i*0 for i in a]
print a
[0, 0, 0, 0, 0]
Depending on what you are using this for, you may need to specify the data type (see 'dtype').
For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):
myarray = numpy.empty(shape=(H,W),dtype='u1')
For an RGB image, include the number of color channels in the shape: shape=(H,W,3)
You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.
Another simple way to create an empty array that can take array is:
import numpy as np
np.empty((2,3), dtype=object)
I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;
ur_list = []
for col in columns:
ur_list.append(list(col))
mat = np.matrix(ur_list)
I think you can create empty numpy array like:
>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)
This format is useful when you want to append numpy array in the loop.
Perhaps what you are looking for is something like this:
x=np.array(0)
In this way you can create an array without any element. It similar than:
x=[]
This way you will be able to append new elements to your array in advance.
The simplest way
Input:
import numpy as np
data = np.zeros((0, 0), dtype=float) # (rows,cols)
data.shape
Output:
(0, 0)
Input:
for i in range(n_files):
data = np.append(data, new_data, axis = 0)

Categories

Resources