Is there a numpy method which is equivalent to the builtin pop for python lists?
Popping obviously doesn't work on numpy arrays, and I want to avoid a list conversion.
There is no pop method for NumPy arrays, but you could just use basic slicing (which would be efficient since it returns a view, not a copy):
In [104]: y = np.arange(5); y
Out[105]: array([0, 1, 2, 3, 4])
In [106]: last, y = y[-1], y[:-1]
In [107]: last, y
Out[107]: (4, array([0, 1, 2, 3]))
If there were a pop method it would return the last value in y and modify y.
Above,
last, y = y[-1], y[:-1]
assigns the last value to the variable last and modifies y.
Here is one example using numpy.delete():
import numpy as np
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(arr)
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
arr = np.delete(arr, 1, 0)
print(arr)
# array([[ 1, 2, 3, 4],
# [ 9, 10, 11, 12]])
Pop doesn't exist for NumPy arrays, but you can use NumPy indexing in combination with array restructuring, for example hstack/vstack or numpy.delete(), to emulate popping.
Here are some example functions I can think of (which apparently don't work when the index is -1, but you can fix this with a simple conditional):
def poprow(my_array,pr):
""" row popping in numpy arrays
Input: my_array - NumPy array, pr: row index to pop out
Output: [new_array,popped_row] """
i = pr
pop = my_array[i]
new_array = np.vstack((my_array[:i],my_array[i+1:]))
return [new_array,pop]
def popcol(my_array,pc):
""" column popping in numpy arrays
Input: my_array: NumPy array, pc: column index to pop out
Output: [new_array,popped_col] """
i = pc
pop = my_array[:,i]
new_array = np.hstack((my_array[:,:i],my_array[:,i+1:]))
return [new_array,pop]
This returns the array without the popped row/column, as well as the popped row/column separately:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> [A,poparow] = poprow(A,0)
>>> poparow
array([1, 2, 3])
>>> A = np.array([[1,2,3],[4,5,6]])
>>> [A,popacol] = popcol(A,2)
>>> popacol
array([3, 6])
There isn't any pop() method for numpy arrays unlike List, Here're some alternatives you can try out-
Using Basic Slicing
>>> x = np.array([1,2,3,4,5])
>>> x = x[:-1]; x
>>> [1,2,3,4]
Or, By Using delete()
Syntax - np.delete(arr, obj, axis=None)
arr: Input array
obj: Row or column number to delete
axis: Axis to delete
>>> x = np.array([1,2,3,4,5])
>>> x = x = np.delete(x, len(x)-1, 0)
>>> [1,2,3,4]
The important thing is that it takes one from the original array and deletes it.
If you don't m
ind the superficial implementation of a single method to complete the process, the following code will do what you want.
import numpy as np
a = np.arange(0, 3)
i = 0
selected, others = a[i], np.delete(a, i)
print(selected)
print(others)
# result:
# 0
# [1 2]
The most 'elegant' solution for retrieving and removing a random item in Numpy is this:
import numpy as np
import random
arr = np.array([1, 3, 5, 2, 8, 7])
element = random.choice(arr)
elementIndex = np.where(arr == element)[0][0]
arr = np.delete(arr, elementIndex)
For curious coders:
The np.where() method returns two lists. The first returns the row indexes of the matching elements and the second the column indexes. This is useful when searching for elements in a 2d array. In our case, the first element of the first returned list is interesting.
To add, If you want to implement pop for a row or column from a numpy 2D array you could do like:
col = arr[:, -1] # gets the last column
np.delete(arr, -1, 1) # deletes the last column
and for row:
row = arr[-1, :] # gets the last row
np.delete(arr, -1, 0) # deletes the last row
unutbu had a simple answer for this, but pop() can also take an index as a parameter. This is how you replicate it with numpy:
pop_index = 4
pop = y[pop_index]
y = np.concatenate([y[:pop_index],y[pop_index+1:]])
OK, since I didn't see a good answer that RETURNS the 1st element and REMOVES it from the original array, I wrote a simple (if kludgy) function utilizing global for a 1d array (modification required for multidims):
tmp_array_for_popfunc = 1d_array
def array_pop():
global tmp_array_for_popfunc
r = tmp_array_for_popfunc[0]
tmp_array_for_popfunc = np.delete(tmp_array_for_popfunc, 0)
return r
check it by using-
print(len(tmp_array_for_popfunc)) # confirm initial size of tmp_array_for_popfunc
print(array_pop()) #prints return value at tmp_array_for_popfunc[0]
print(len(tmp_array_for_popfunc)) # now size is 1 smaller
I made a function as follow, doing almost the same. This function has 2 arguments: np_array and index, and return the value of the given index of the array.
def np_pop(np_array, index=-1):
'''
Pop the "index" from np_array and return the value.
Default value for index is the last element.
'''
# add this to make sure 'numpy' is imported
import numpy as np
# read the value of the given array at the given index
value = np_array[index]
# remove value from array
np.delete(np_array, index, 0)
# return the value
return value
Remember you can add a condition to make sure the given index is exist in the array and return -1 if anything goes wrong.
Now you can use it like this:
import numpy as np
i = 2 # let's assume we want to pop index number 2
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) # assume 'y' is our numpy array
poped_val = np_pop(y, i) # value of the piped index
Related
I have an array of arbitrary length, and I want to select N elements of it, evenly spaced out (approximately, as N may be even, array length may be prime, etc) that includes the very first arr[0] element and the very last arr[len-1] element.
Example:
>>> arr = np.arange(17)
>>> arr
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
Then I want to make a function like the following to grab numElems evenly spaced out within the array, which must include the first and last element:
GetSpacedElements(numElems = 4)
>>> returns 0, 5, 11, 16
Does this make sense?
I've tried arr[0:len:numElems] (i.e. using the array start:stop:skip notation) and some slight variations, but I'm not getting what I'm looking for here:
>>> arr[0:len:numElems]
array([ 0, 4, 8, 12, 16])
or
>>> arr[0:len:numElems+1]
array([ 0, 5, 10, 15])
I don't care exactly what the middle elements are, as long as they're spaced evenly apart, off by an index of 1 let's say. But getting the right number of elements, including the index zero and last index, are critical.
To get a list of evenly spaced indices, use np.linspace:
idx = np.round(np.linspace(0, len(arr) - 1, numElems)).astype(int)
Next, index back into arr to get the corresponding values:
arr[idx]
Always use rounding before casting to integers. Internally, linspace calls astype when the dtype argument is provided. Therefore, this method is NOT equivalent to:
# this simply truncates the non-integer part
idx = np.linspace(0, len(array) - 1, numElems).astype(int)
idx = np.linspace(0, len(arr) - 1, numElems, dtype='int')
Your GetSpacedElements() function should also take in the array to avoid unfortunate side effects elsewhere in code. That said, the function would need to look like this:
import numpy as np
def GetSpacedElements(array, numElems = 4):
out = array[np.round(np.linspace(0, len(array)-1, numElems)).astype(int)]
return out
arr = np.arange(17)
print(array)
spacedArray = GetSpacedElements(arr, 4)
print (spacedArray)
If you want to know more about finding indices that match values you seek, also have a look at numpy.argmin and numpy.where. Implementing the former:
import numpy as np
test = np.arange(17)
def nearest_index(array, value):
return (np.abs(np.asarray(array) - value)).argmin()
def evenly_spaced_indices(array, steps):
return [nearest_index(array, value) for value in np.linspace(np.min(array), np.max(array), steps)]
print(evenly_spaced_indices(test,4))
You should keep in mind that this is an unnecessary amount of function calls for the initial question you asked as switftly demonstrated by coldspeed. np.round intuitively rounds to the closest matching integer serving as index, implementing a similar process but optimised in C++. If you are interested in the indices too, you could have your function simply return both:
import numpy as np
def GetSpacedElements(array, numElems=4, returnIndices=False):
indices = np.round(np.linspace(0, len(arr) - 1, numElems)).astype(int)
values = array[indices]
return (values, indices) if returnIndices else (values)
arr = np.arange(17) + 42
print(arr)
print(GetSpacedElements(arr, 4)) # values only
print(GetSpacedElements(arr, 4, returnIndices=True)[0]) # values only
print(GetSpacedElements(arr, 4, returnIndices=True)[1]) # indices only
To get N evenly spaced elements from list 'x':
x[::int(np.ceil( len(x) / N ))]
I have a 3D numpy array of the shape (1, 60, 1). Now I need to remove the first value of the second dimension and instead append a new value at the end.
If it was a list, the code would look somewhat like this:
x = [1, 2, 3, 4]
x = x[1:]
x.append(5)
resulting in this list: [2, 3, 4, 5]
What would be the easiest way to do this with numpy?
I have basically never really worked with numpy before, so that's probably a pretty trivial problem, but thanks for your help!
import numpy as np
arr = np.arange(60) #creating a nd array with 60 values
arr = arr.reshape(1,60,1) # shaping it as mentiond in question
arr = np.roll(arr, -1) # use np.roll to circulate the array left or right (-1 is 1 step to the left)
#Now your last value is in the second last position, the second last value in the third last pos and so on (Your first value moves to the last position)
arr[:,-1,:] = 1000 # index the last location and add the values you want
print(arr)
I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!
EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])
I am coming from a MATLAB background and moving over to Python. I am trying to figure out a way to set up a variable which is some vector which contains a range of indices which can then be used to slice some other array.
In MATLAB I would do this:
A = [2,3,4,5,6; 9,4,3,2,1; 5,4,3,2,5]; %some arbitrary matrix
begin = 2; %the first index I want to pull
end = 4; %the last index I want to pull
idx = 2:4; %the vector of indices I want
A(:,idx) %results in me pulling out the 2nd, 3rd and 4th column of A
Now in Python, what is the equivalent?
import numpy as np
A = np.array([[2,3,4,5,6],[9,4,3,2,1],[5,4,3,2,5]]) #some arbitrary matrix
begin = 1 #first index
end = 3 #last index
idx = ??? #This is the part I don't know! <<<-------------------
A[:,idx] #I want the same result as the Matlab example above
Obviously for this trivial example I could just have idx = [1,2,3], but I have much more complicated scenario in real life where I cannot write out the indices manually.
I have tried using the range and np.arange functions but they give the error that the object is not callable.
When I look at some MATLAB-to-Numpy conversions such as here, it suggests that the idx = 2:4 command in MATLAB command is equivalent to idx = range(1,3) in Python, but this is apparently not quite true?
Any help is appreciated.
You need slice:
>>> import numpy as np
>>> A = np.array([[2,3,4,5,6],[9,4,3,2,1],[5,4,3,2,5]])
>>> begin = 1
>>> end = 3
>>> s = slice(begin, end)
>>> A[:,s]
array([[3, 4],
[4, 3],
[4, 3]])
you would need to do this
idx = range(begin, end + 1)
notice you need to add 1 to end value because range doesn't include final value, i.e. ends in end - 1
A fairly general and convenient way to "freeze" an indexing expression is np.s_:
a = np.arange(12).reshape(3, 4)
idx = np.s_[1:3]
a[:, idx]
# array([[ 1, 2],
# [ 5, 6],
# [ 9, 10]])
idx = np.s_[::2, [1, 3, 0]]
a[idx]
# array([[ 1, 3, 0],
# [ 9, 11, 8]])
What is the most efficient way to remove the last element from a numpy 1 dimensional array? (like pop for list)
NumPy arrays have a fixed size, so you cannot remove an element in-place. For example using del doesn't work:
>>> import numpy as np
>>> arr = np.arange(5)
>>> del arr[-1]
ValueError: cannot delete array elements
Note that the index -1 represents the last element. That's because negative indices in Python (and NumPy) are counted from the end, so -1 is the last, -2 is the one before last and -len is actually the first element. That's just for your information in case you didn't know.
Python lists are variable sized so it's easy to add or remove elements.
So if you want to remove an element you need to create a new array or view.
Creating a new view
You can create a new view containing all elements except the last one using the slice notation:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> arr[:-1] # all but the last element
array([0, 1, 2, 3])
>>> arr[:-2] # all but the last two elements
array([0, 1, 2])
>>> arr[1:] # all but the first element
array([1, 2, 3, 4])
>>> arr[1:-1] # all but the first and last element
array([1, 2, 3])
However a view shares the data with the original array, so if one is modified so is the other:
>>> sub = arr[:-1]
>>> sub
array([0, 1, 2, 3])
>>> sub[0] = 100
>>> sub
array([100, 1, 2, 3])
>>> arr
array([100, 1, 2, 3, 4])
Creating a new array
1. Copy the view
If you don't like this memory sharing you have to create a new array, in this case it's probably simplest to create a view and then copy (for example using the copy() method of arrays) it:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> sub_arr = arr[:-1].copy()
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
2. Using integer array indexing [docs]
However, you can also use integer array indexing to remove the last element and get a new array. This integer array indexing will always (not 100% sure there) create a copy and not a view:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> indices_to_keep = [0, 1, 2, 3]
>>> sub_arr = arr[indices_to_keep]
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
This integer array indexing can be useful to remove arbitrary elements from an array (which can be tricky or impossible when you want a view):
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> arr[[0, 1, 3, 4]] # keep first, second, fourth and fifth element
array([5, 6, 8, 9])
If you want a generalized function that removes the last element using integer array indexing:
def remove_last_element(arr):
return arr[np.arange(arr.size - 1)]
3. Using boolean array indexing [docs]
There is also boolean indexing that could be used, for example:
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> keep = [True, True, True, True, False]
>>> arr[keep]
array([5, 6, 7, 8])
This also creates a copy! And a generalized approach could look like this:
def remove_last_element(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
If you would like more information on NumPys indexing the documentation on "Indexing" is quite good and covers a lot of cases.
4. Using np.delete()
Normally I wouldn't recommend the NumPy functions that "seem" like they are modifying the array in-place (like np.append and np.insert) but do return copies because these are generally needlessly slow and misleading. You should avoid them whenever possible, that's why it's the last point in my answer. However in this case it's actually a perfect fit so I have to mention it:
>>> arr = np.arange(10, 20)
>>> arr
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> np.delete(arr, -1)
array([10, 11, 12, 13, 14, 15, 16, 17, 18])
5.) Using np.resize()
NumPy has another method that sounds like it does an in-place operation but it really returns a new array:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> np.resize(arr, arr.size - 1)
array([0, 1, 2, 3])
To remove the last element I simply provided a new shape that is 1 smaller than before, which effectively removes the last element.
Modifying the array inplace
Yes, I've written previously that you cannot modify an array in place. But I said that because in most cases it's not possible or only by disabling some (completely useful) safety checks. I'm not sure about the internals but depending on the old size and the new size it could be possible that this includes an (internal-only) copy operation so it might be slower than creating a view.
Using np.ndarray.resize()
If the array doesn't share its memory with any other array, then it's possible to resize the array in place:
>>> arr = np.arange(5, 10)
>>> arr.resize(4)
>>> arr
array([5, 6, 7, 8])
However that will throw ValueErrors in case it's actually referenced by another array as well:
>>> arr = np.arange(5)
>>> view = arr[1:]
>>> arr.resize(4)
ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function
You can disable that safety-check by setting refcheck=False but that shouldn't be done lightly because you make yourself vulnerable for segmentation faults and memory corruption in case the other reference tries to access the removed elements! This refcheck argument should be treated as an expert-only option!
Summary
Creating a view is really fast and doesn't take much additional memory, so whenever possible you should try to work as much with views as possible. However depending on the use-cases it's not so easy to remove arbitrary elements using basic slicing. While it's easy to remove the first n elements and/or last n elements or remove every x element (the step argument for slicing) this is all you can do with it.
But in your case of removing the last element of a one-dimensional array I would recommend:
arr[:-1] # if you want a view
arr[:-1].copy() # if you want a new array
because these most clearly express the intent and everyone with Python/NumPy experience will recognize that.
Timings
Based on the timing framework from this answer:
# Setup
import numpy as np
def view(arr):
return arr[:-1]
def array_copy_view(arr):
return arr[:-1].copy()
def array_int_index(arr):
return arr[np.arange(arr.size - 1)]
def array_bool_index(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
def array_delete(arr):
return np.delete(arr, -1)
def array_resize(arr):
return np.resize(arr, arr.size - 1)
# Timing setup
timings = {view: [],
array_copy_view: [], array_int_index: [], array_bool_index: [],
array_delete: [], array_resize: []}
sizes = [2**i for i in range(1, 20, 2)]
# Timing
for size in sizes:
print(size)
func_input = np.random.random(size=size)
for func in timings:
print(func.__name__.ljust(20), ' ', end='')
res = %timeit -o func(func_input) # if you use IPython, otherwise use the "timeit" module
timings[func].append(res)
# Plotting
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
ax = plt.subplot(111)
for func in timings:
ax.plot(sizes,
[time.best for time in timings[func]],
label=func.__name__)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()
I get the following timings as log-log plot to cover all the details, lower time still means faster, but the range between two ticks represents one order of magnitude instead of a fixed amount. In case you're interested in the specific values, I copied them into this gist:
According to these timings those two approaches are also the fastest. (Python 3.6 and NumPy 1.14.0)
If you want to quickly get array without last element (not removing explicit), use slicing:
array[:-1]
To delete the last element from a 1-dimensional NumPy array, use the numpy.delete method, like so:
import numpy as np
# Create a 1-dimensional NumPy array that holds 5 values
values = np.array([1, 2, 3, 4, 5])
# Remove the last element of the array using the numpy.delete method
values = np.delete(values, -1)
print(values)
Output:
[1 2 3 4]
The last value of the NumPy array, which was 5, is now removed.