Deleting multiple elements at once from a numpy 2d array

Deleting multiple elements at once from a numpy 2d array - python

Is there a way to delete from a numpy 2d array when I have the indexes? For example:
a = np.random.random((4,5))
idxs = [(0,1), (1,3), (2, 1), (3,4)]
I want to remove the indexes specified above. I tried:
np.delete(a, idxs)
but it just removes the top row.
To give an example, for the following input:
[
[0.15393912, 0.08129568, 0.34958515, 0.21266128, 0.92372852],
[0.42450441, 0.1027468 , 0.13050591, 0.60279229, 0.41168151],
[0.06330729, 0.60704682, 0.5340644 , 0.47580567, 0.42528617],
[0.27122323, 0.42713967, 0.94541073, 0.21462462, 0.07293321]
]
and with the indexes as mentioned above, I want the result to be:
[
[0.15393912, 0.34958515, 0.21266128, 0.92372852],
[0.42450441, 0.1027468 , 0.13050591, 0.41168151],
[0.06330729, 0.5340644 , 0.47580567, 0.42528617],
[0.27122323, 0.42713967, 0.94541073, 0.21462462]
]

your index should be for flat array else it only works to remove a row or column.
Here is how you can convert index and use it
arr = np.array([
[0.15393912, 0.08129568, 0.34958515, 0.21266128, 0.92372852],
[0.42450441, 0.1027468 , 0.13050591, 0.60279229, 0.41168151],
[0.06330729, 0.60704682, 0.5340644 , 0.47580567, 0.42528617],
[0.27122323, 0.42713967, 0.94541073, 0.21462462, 0.07293321]
])
idxs = [(0,1), (1,3), (2, 1), (3,4)]
idxs = [i*arr.shape[1]+j for i, j in idxs]
np.delete(arr, idxs).reshape(4,4)
for reshaping you should remove the items such that there will be equal number of items and rows and columns after deletion

Numpy doesn't know that you are removing exactly one element per row when you give it arbitrary indices like that. Since you do know that, I would suggest using a mask to shrink the array. Masking has the same problem: it doesn't assume anything about the shape of the result (because it can't in general), and returns a raveled array. You can reinstate the shape you want quite easily though. In fact, I would suggest removing the first element of each index entirely, since you have one per row:
def remove_indices(a, idx):
if len(idx) != len(idx): raise ValueError('Wrong number of indices')
mask = np.ones(a.size, dtype=np.bool_)
mask[np.arange(len(idx)), idx] = False
return a[mask].reshape(a.shape[0], a.shape[1] - 1)

Here is a method using np.where
import numpy as np
import operator as op
a = np.arange(20.0).reshape(4,5)
idxs = [(0,1), (1,3), (2, 1), (3,4)]
m,n = a.shape
# extract column indices
# there are simpler ways but this is fast
columns = np.fromiter(map(op.itemgetter(1),idxs),int,m)
# build decimated array
result = np.where(columns[...,None]>np.arange(n-1),a[...,:-1],a[...,1:])
result
# array([[ 0., 2., 3., 4.],
# [ 5., 6., 7., 9.],
# [10., 12., 13., 14.],
# [15., 16., 17., 18.]])

As the documentation says
Return a new array with sub-arrays along an axis deleted.
np.delete deletes a row or a column based on the value of the parameter axis.
Secondly np.delete expects int or array of ints as parameter and not a list of tuples.
you need to specify what the requirement is.
As #divakar suggested look at other answers on Stackoverflow regarding deleting individual items in numpy array.

Related

Calculate the mean over a mixed data structure

I have a list of lists that looks something like:
data = [
[1., np.array([2., 3., 4.]), ...],
[5., np.array([6., 7., 8.]), ...],
...
]
where each of the internal lists is the same length and contains the same data type/shape at each entry. I would like to calculate the mean over corresponding entries and return something of the same structure as the internal lists. For example, in the above case (assuming only two entries) I want the result to be:
[3., np.array([4., 5., 6.]), ...]
What is the best way to do this with Python?

data is a list, so a list comprehension seems like a natural option. Even if it were a numpy array, given that it's a jagged array, it wouldn't benefit from being wrapped in an ndarray anyway, so a list comp would still be the best option, in my opinion.
Anyway, use zip() to "transpose" data and call np.mean() in a loop to find mean along the first axis.
[np.mean(x, axis=0) for x in zip(*data)]
# [3.0, array([4., 5., 6.]), array([[2., 2.], [2., 2.]])]

if you have a list exactly the same as the one shown in the example, you can do it with the following code.
First we declare some variables to store our results:
number_sum = 0
list_sum = np.array([0,0,0])
It is important that you initialize the values you need to 0 in list_sum. That is, if the data array contains 5 elements, that array should be: list_sum = np.array([0,0,0,0,0]).
The next step is to perform the sum of all elements in data. First we add the int values and then we perform the sum of each element of the list as follows:
for number, nparray in data:
number_sum += number
for index, item in enumerate(nparray):
list_sum[index] += item
Since we know how the variable data is structured (each input is made up of an int value and an np.array) we can do the addition that way. Although be careful with the computational complexity because in examples with longer arrays it could become very high in terms of complexity, since two for loops are being nested.
Finally, you can check that if you divide the sum of the elements by the length of data you get the desired value:
print(number_sum/len(data))
print(list_sum/len(data))
Now you just have to add those two new values to a new list. I hope it helps, greetings and good luck!

The following works:
import numpy as np
data = [
[1., np.array([2., 3., 4.]), np.array([[1., 1.], [1., 1.]])],
[5., np.array([6., 7., 8.]), np.array([[3., 3.], [3., 3.]])],
]
number_of_samples = len(data)
number_of_elements = len(data[0])
means = []
for ielement in range(number_of_elements):
mean_list = []
for isample in range(number_of_samples):
mean_list.append(data[isample][ielement])
mean_list = np.stack(mean_list)
mean = mean_list.mean(axis=0)
means.append(mean)
print(means)
but is a bit ugly, nests a for loops, and does not seem to be very pythonic. Any improvements over this are welcomed.

How does numpy array typing interact with object?

I am currently trying to implement a datatype that stores floats in an numpy array. However trying to assign an array with elements of this type with various lengths seems to obviously break the code. One would assign a sequence to an array element, which is not possible.
One can bypass this by using the data type object instead of float. Why is that? How could one resolve this problem using floats without creating a sequence?
Example code that does not work.
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3.]], dtype=foo)
Example code that does work:
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3., 2.]], dtype=foo)
Example code that does work, I try to replicate for float:
from numpy import *
foo= dtype(object, [])
x = array([[2., 3.], [3.]], dtype=foo)

The object dtype in Numpy simply creates an array of pointers to Python objects. This means you lose the performance advantage you usually get from Numpy, but it's still sometimes useful to do this.
Your last example creates a one-dimensional Numpy array of length two, so that's two pointers to Python objects. Both these objects happen to be lists, and Python list have arbitrary dynamic length.

I don't know what you were trying to achieve with this, but note that
>>> np.dtype(np.float32, []) == np.float32
True
Arrays require the same number of elements for each row. So, if you feed a list of lists in numpy and all sublists have the same number of elements, it'll happily convert it to an array. This is why your second example works.
If the sublists are not the same length, then each sublist is treated as a single object and you end up with a 1D array of objects. This is why your third example works. Your first example doesn't work because you try to cast a sequence of objects to floats, which isn't possible.
In short, you can't create an array of floats if your sublists are of different lengths. At best, you can create an array of 1D arrays, since they are still considered objects.
>>> x = np.array(list(map(np.array, [[2., 3.], [3.]])))
>>> x
array([array([ 2., 3.]), array([ 3.])], dtype=object)
>>> x[0]
array([ 2., 3.])
>>> x[0][1]
3.0
>>> # but you can't do this
>>> x[0,1]
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
x[0,1]
IndexError: too many indices for array
If you're bent on creating a float 2D array, you have to extend all your sublists to the same size with None, which will be converted to np.nan.
>>> lists = [[2., 3.], [3.]]
>>> max_len = max(map(len, lists))
>>> for i, sublist in enumerate(lists):
sublist = sublist + [None] * (max_len - len(sublist))
lists[i] = sublist
>>> np.array(lists, dtype=np.float32)
array([[ 2., 3.],
[ 3., nan]], dtype=float32)

Python equivalent to MATLAB's dynamic array initialization [duplicate]

I want to create an empty array and append items to it, one at a time.
xs = []
for item in data:
xs.append(item)
Can I use this list-style notation with NumPy arrays?

That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])

A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!

To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]

I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.

For creating an empty NumPy array without defining its shape you can do the following:
arr = np.array([])
The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.
for adding new element to the array us can do:
arr = np.append(arr, 'new element')
Note that in the background for python there's no such thing as an array without
defining its shape. as #hpaulj mentioned this also makes a one-rank
array.

You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.

Here is some workaround to make numpys look more like Lists
np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)
OUTPUT: array([ 2., 24.])

If you absolutely don't know the final size of the array, you can increment the size of the array like this:
my_arr = numpy.zeros((0,5))
for i in range(3):
my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)
[[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]]
Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.

You can apply it to build any kind of array, like zeros:
a = range(5)
a = [i*0 for i in a]
print a
[0, 0, 0, 0, 0]

Depending on what you are using this for, you may need to specify the data type (see 'dtype').
For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):
myarray = numpy.empty(shape=(H,W),dtype='u1')
For an RGB image, include the number of color channels in the shape: shape=(H,W,3)
You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.

Another simple way to create an empty array that can take array is:
import numpy as np
np.empty((2,3), dtype=object)

I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;
ur_list = []
for col in columns:
ur_list.append(list(col))
mat = np.matrix(ur_list)

I think you can create empty numpy array like:
>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)
This format is useful when you want to append numpy array in the loop.

Perhaps what you are looking for is something like this:
x=np.array(0)
In this way you can create an array without any element. It similar than:
x=[]
This way you will be able to append new elements to your array in advance.

The simplest way
Input:
import numpy as np
data = np.zeros((0, 0), dtype=float) # (rows,cols)
data.shape
Output:
(0, 0)
Input:
for i in range(n_files):
data = np.append(data, new_data, axis = 0)

Efficient way to drop a column from a Numpy array?

If I have a very large numpy array with one useless column, how could I drop it without creating a copy of the original array?
np.delete(my_np_array, 0, 1)
The above code will return a copy of the array without the zero-th column. But instead I would like to simply delete that column from my_np_array since I don't need it. For very large datasets, the memory management becomes important and copying may not be an option.

If memory is the main concern, what you can do is move columns around within your array such that the unneeded column gets at the very end of your array, then use ndarray.resize, which modifies he array in-place, to shrink it down and discard the outer column.
You cannot simply remove the first column of an array in-place using the provided API, and I suspect it is because of the memory layout of an ndarray that maps multidimensional indexing to unidimensional byte-oriented addressing within blocks of contiguous memory.
The following example copies the last column into the first and then deletes the last (now unneeded), immediately purging the associated memory. So it basically removes the obsolete column from memory completely, at the cost of changing your column order.
D1, D2 = A.shape
A[:, 0] = A[:, D2-1]
A.resize((D1, D2-1), refcheck=False)
A.shape
# => would be (5, 4) if the shape was initially (5, 5) for example

If you use slicing numpy won't make a copy; in other words
a = numpy.array([1, 2, 3, 4, 5])
b = a[1:] # view elements from second to last, NOT making a copy
b[0] = 12 # Change first element of `b`, i.e. second of `a`
print a
will reply [1, 12, 3, 4, 5]
If you need to delete an element in the middle however a single slicing won't work.

Numpy arrays are immutable. So they can't be re-sized without creating a intermediate copy.
How to remove specific elements in a numpy array
Creating a view with slicing, and make a copy of that is probably the fastest you can do.
In [804]: a = np.ones((2,2))
In [805]: a
Out[805]:
array([[ 1., 1.],
[ 1., 1.]])
In [806]: np.resize(a,(3,2))
Out[806]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
In [807]: a <- a should now be resized if it was done inplace?
Out[807]:
array([[ 1., 1.],
[ 1., 1.]])

How do I create an empty array and then append to it in NumPy?

I want to create an empty array and append items to it, one at a time.
xs = []
for item in data:
xs.append(item)
Can I use this list-style notation with NumPy arrays?

That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])

A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!

To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]

I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.

For creating an empty NumPy array without defining its shape you can do the following:
arr = np.array([])
The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.
for adding new element to the array us can do:
arr = np.append(arr, 'new element')
Note that in the background for python there's no such thing as an array without
defining its shape. as #hpaulj mentioned this also makes a one-rank
array.

You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.

Here is some workaround to make numpys look more like Lists
np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)
OUTPUT: array([ 2., 24.])

If you absolutely don't know the final size of the array, you can increment the size of the array like this:
my_arr = numpy.zeros((0,5))
for i in range(3):
my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)
[[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]]
Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.

You can apply it to build any kind of array, like zeros:
a = range(5)
a = [i*0 for i in a]
print a
[0, 0, 0, 0, 0]

Depending on what you are using this for, you may need to specify the data type (see 'dtype').
For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):
myarray = numpy.empty(shape=(H,W),dtype='u1')
For an RGB image, include the number of color channels in the shape: shape=(H,W,3)
You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.

Another simple way to create an empty array that can take array is:
import numpy as np
np.empty((2,3), dtype=object)

I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;
ur_list = []
for col in columns:
ur_list.append(list(col))
mat = np.matrix(ur_list)

I think you can create empty numpy array like:
>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)
This format is useful when you want to append numpy array in the loop.

Perhaps what you are looking for is something like this:
x=np.array(0)
In this way you can create an array without any element. It similar than:
x=[]
This way you will be able to append new elements to your array in advance.

The simplest way
Input:
import numpy as np
data = np.zeros((0, 0), dtype=float) # (rows,cols)
data.shape
Output:
(0, 0)
Input:
for i in range(n_files):
data = np.append(data, new_data, axis = 0)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deleting multiple elements at once from a numpy 2d array - python

Related

Calculate the mean over a mixed data structure

How does numpy array typing interact with object?

Python equivalent to MATLAB's dynamic array initialization [duplicate]

Efficient way to drop a column from a Numpy array?

How do I create an empty array and then append to it in NumPy?

Categories

Resources