Appending successive rows to Python dataframe - python

I want to create a bidimensional numpy array.
I tried this:
import numpy as np
result = np.empty
np.append(result, [1, 2, 3])
np.append(result, [4, 5, 9])
1.The dimensions of the array are: (2, 3). How can I get them?
I tried:
print(np.shape(result))
print(np.size(result))
But this prints:
()
1
2.How can I access a specific element in the array?
I tried this:
print(result.item((1, 2)))
But this returns:
Traceback (most recent call last):
File "python", line 10, in <module>
AttributeError: 'builtin_function_or_method' object has no attribute 'item'

Ideally you should be testing this sort of code in an interactive session, where you can easily get more information on the steps. I'll illustrate in ipython.
In [1]: result = np.empty
In [2]: result
Out[2]: <function numpy.core.multiarray.empty>
This is a function, not an array. The correct use is result = np.empty((3,)). That is you have to call the function with a desired size parameter.
In [3]: np.append(result, [1,2,3])
Out[3]: array([<built-in function empty>, 1, 2, 3], dtype=object)
append has created an array, but look at the contents - the function and 3 numbers. And the dtype. Also np.append returns a result. It does not work in-place.
In [4]: result.item((1,2))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-51f2b4be4f43> in <module>()
----> 1 result.item((1,2))
AttributeError: 'builtin_function_or_method' object has no attribute 'item'
Your error tells us that result is a function, not an array. The same thing you set at the start.
In [5]: np.shape(result)
Out[5]: ()
In [6]: np.array(result)
Out[6]: array(<built-in function empty>, dtype=object)
In this case the function versions of np.shape and np.size aren't diagnostic, because they first convert result into an array. result.shape would have given an error.
The underlying problem is that you are using a list model
result = []
result.append([1,2,3])
result.append([4,5,6])
But the array append is misnamed, and misused. It is just a front end to np.concatenate. If you don't understand concatenate, you probably won't use np.append right. In fact, I would argue that you shouldn't use np.append at all.
The correct way to use append is to start with an array that has size 0 dimension, and reuse the result:
In [7]: result = np.empty((0,3),int)
In [8]: result
Out[8]: array([], shape=(0, 3), dtype=int32)
In [9]: result = np.append(result,[1,2,3])
In [10]: result
Out[10]: array([1, 2, 3])
In [11]: result = np.append(result,[4,5,6])
In [12]: result
Out[12]: array([1, 2, 3, 4, 5, 6])
But maybe that isn't what you want? Even I'm misusing append.
Back to the drawing board:
In [15]: result = []
In [16]: result.append([1,2,3])
In [17]: result.append([4,5,6])
In [18]: result
Out[18]: [[1, 2, 3], [4, 5, 6]]
In [19]: result = np.array(result)
In [20]: result
Out[20]:
array([[1, 2, 3],
[4, 5, 6]])
With a real array, your item expression works, though usually we use [] indexing:
In [21]: result[1,2]
Out[21]: 6
In [22]: result.item((1,2))
Out[22]: 6
Source code for np.append (note the use of np.concatenate):
In [23]: np.source(np.append)
In file: /usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py
def append(arr, values, axis=None):
"""
...
"""
arr = asanyarray(arr)
if axis is None:
if arr.ndim != 1:
arr = arr.ravel()
values = ravel(values)
axis = arr.ndim-1
return concatenate((arr, values), axis=axis)

That's not quite the way to use numpy arrays. empty is a function. append, for example, returns new arrays, but you're ignoring the return value.
To create the 2-d array, use this:
In [3]: result = np.array([[1, 2, 3], [4, 5, 9]])
To find its shape:
In [4]: result.shape
Out[4]: (2, 3)
To access a specific element:
In [5]: result[1][2]
Out[5]: 9

Related

Append element into a numpy empty array, but he maximum index will be fixed as 2

I want to append some element to a fixed size empty array. But when I append more than 2 elements, the error came out, as:
[1 2]
Traceback (most recent call last):
File "/home/ctchan127au/Desktop/DSA/WS3/tryNP.py", line 8, in <module>
stack = np.append(stack[2], 3)
IndexError: index 2 is out of bounds for axis 0 with size 2
Here's my code:
import numpy as np
stack = np.empty(5, dtype=object)
stack = np.append(stack[0], 1)
stack = np.append(stack[1], 2)
print(stack)
stack = np.append(stack[2], 3)
print(stack)
thanks
np.empty with object makes an array full of None:
In [12]: stack = np.empty(5, dtype=object)
In [13]: stack
Out[13]: array([None, None, None, None, None], dtype=object)
np.append makes a new array, here joining a None and 1:
In [14]: np.append(stack[0],1)
Out[14]: array([None, 1], dtype=object)
Do you really want to assign that to stack, loosing the original 5 element array?
Using Out[14] instead of that stack assignment, see what the next append does:
In [15]: np.append(Out[14][1],2)
Out[15]: array([1, 2])
And the error line should now be clear. You are trying to append to a non-existent 3rd element of a 2 element array:
In [16]: np.append(Out[15][2],3)
Traceback (most recent call last):
Input In [16] in <cell line: 1>
np.append(Out[15][2],3)
IndexError: index 2 is out of bounds for axis 0 with size 2
np.empty and np.append are NOT list clones. Pay close attention to what each step does. Reading the docs is also a good idea, especially if behavior doesn't match your guesses.
I suspect you were trying to replicate this:
In [17]: alist=[]
In [18]: alist.append(1)
In [19]: alist.append(2)
In [20]: alist.append(3)
In [21]: alist
Out[21]: [1, 2, 3]
Or are you trying to assign values, one at a time, to an array:
In [39]: stack = np.zeros(5, dtype=int)
In [40]: stack[0] = 1
In [41]: stack[1] = 2
In [42]: stack[2] = 3
In [43]: stack
Out[43]: array([1, 2, 3, 0, 0])
while lists are designed for growth, especially with end append, "growing/shrinking" numpy arrays is generally not a good idea. You have to make a new array with each change (I'm glossing over the rarely use resize method).

locate numpy indices based on closest value in 2d array with unmatch dimensions

**Made a mistake in the original version. The dimensions of arrays are unequal now.
This is a stupid question but I can't find the right answer.
How do you index the closest number in a 2d numpy array? Let say we have
e = np.array([[1, 2], [4, 5, 6]])
I want to locate the indices of values closest to 2, such that it return
array([1, 0])
Many thanks!
Usually you would use np.argwhere(e == 2):
In [4]: e = np.array([[1,2,3],[4,5,6]])
In [6]: np.argwhere(e == 2)
Out[6]: array([[0, 1]])
In case you really need the output you specified, you have to add an extra [0]
In [7]: np.argwhere(e == 2)[0]
Out[7]: array([0, 1])
However, the input you provided is not a standard numeric array but an object array because len(e[0]) != len(e[1]):
In [1]: e = np.array([[1,2],[4,5,6]])
In [3]: e
Out[3]: array([list([1, 2]), list([4, 5, 6])], dtype=object)
This makes numpy much less useful and efficient. You would have to resort to something like:
In [26]: res = []
...: for i, f in enumerate(e):
...: g = np.array(f)
...: w = np.argwhere(g==2)
...: if len(w):
...: res += [(i, v) for v in w]
...: res = np.array(res)
Assuming this was a typo and if you are interested in the value closest to 2 even if 2 is not present, you would have to do something like:
In [35]: np.unravel_index((np.abs(e - 2.2)).argmin(), e.shape)
Out[35]: (0, 1)
Here I chose 2.2 as an example value.
This can be done by defining a function that works on a 1D array and applying it over the rows of the 2D array:
e = np.array([[1,2,3], [4,5,6]])
# function to find position of nearest value in 1D array
def find_nearest(a, val):
return np.abs(a - val).argmin()
# apply it
np.apply_along_axis(find_nearest, axis = 1, arr = e, val = 2)

Appending a new row to a numpy array

I am trying to append a new row to an existing numpy array in a loop. I have tried the methods involving append, concatenate and also vstack none of them end up giving me the result I want.
I have tried the following:
for _ in col_change:
if (item + 2 < len(col_change)):
arr=[col_change[item], col_change[item + 1], col_change[item + 2]]
array=np.concatenate((array,arr),axis=0)
item+=1
I have also tried it in the most basic format and it still gives me an empty array.
array=np.array([])
newrow = [1, 2, 3]
newrow1 = [4, 5, 6]
np.concatenate((array,newrow), axis=0)
np.concatenate((array,newrow1), axis=0)
print(array)
I want the output to be [[1,2,3][4,5,6]...]
The correct way to build an array incrementally is to not start with an array:
alist = []
alist.append([1, 2, 3])
alist.append([4, 5, 6])
arr = np.array(alist)
This is essentially the same as
arr = np.array([ [1,2,3], [4,5,6] ])
the most common way of making a small (or large) sample array.
Even if you have good reason to use some version of concatenate (hstack, vstack, etc), it is better to collect the components in a list, and perform the concatante once.
If you want [[1,2,3],[4,5,6]] I could present you an alternative without append: np.arange and then reshape it:
>>> import numpy as np
>>> np.arange(1,7).reshape(2, 3)
array([[1, 2, 3],
[4, 5, 6]])
Or create a big array and fill it manually (or in a loop):
>>> array = np.empty((2, 3), int)
>>> array[0] = [1,2,3]
>>> array[1] = [4,5,6]
>>> array
array([[1, 2, 3],
[4, 5, 6]])
A note on your examples:
In the second one you forgot to save the result, make it array = np.concatenate((array,newrow1), axis=0) and it works (not exactly like you want it but the array is not empty anymore). The first example seems badly indented and without know the variables and/or the problem there it's hard to debug.

Numpy: apply along axis returns error: Setting an array element with a sequence

a very quick and simple error that I can't figure out to save my life:
temp = np.array([[5,0,3,5,6,0],
[2,2,1,3,0,0],
[5,3,4,5,3,4]])
def myfunc(x):
return x[np.nonzero(x)]
np.apply_along_axis(myfunc, axis=1, arr=temp)
Expected output is the non-zero numbers of each ROW of my temp array:
[5,3,5,6],[2,2,1,3],[5,3,4,5,3,4]
However, I'm getting the error:ValueError: setting an array element with a sequence.
If I simply do it without apply_along_axis, it works:
# prints [5,3,5,6]
print temp[0][np.nonzero(temp[0])]
The weird thing is that, if I just add a np.mean() to the myfunc return to the first code block above, it works as expected:
# This works as expected
temp = np.array([[5,0,3,5,6,0],
[2,2,1,3,0,0],
[5,3,4,5,3,4]])
def myfunc(x):
return np.mean(x[np.nonzero(x)])
np.apply_along_axis(myfunc, axis=1, arr=temp)
I suspect it's something to do with how apply_along_axis work underneath the hood. Any tips will be appreciated!
As mentioned in the documentation -
Returns: apply_along_axis : ndarray The output array. The shape of
outarr is identical to the shape of arr, except along the axis
dimension, where the length of outarr is equal to the size of the
return value of func1d. If func1d returns a scalar outarr will have
one fewer dimensions than arr.
Because of the inconsistent shapes of the output at different iterations, it seems we are getting that error.
Now, to solve your problem, let me suggest a method with np.nonzero on the entire array and then splitting the second output from it -
In [165]: temp = np.array([[5,0,3,5,6,0],
...: [2,2,1,3,0,0],
...: [5,3,4,5,3,4]])
In [166]: r,c = np.nonzero(temp)
...: idx = np.unique(r,return_index=1)[1]
...: out = np.split(c,idx[1:])
...:
In [167]: out
Out[167]: [array([0, 2, 3, 4]), array([0, 1, 2, 3]), array([0, 1, 2, 3, 4, 5])]
In numpy 1.13, this definition should work:
def myfunc(x):
res = np.empty((), dtype=object)
res[()] = x[np.nonzero(x)]
return res
By returning a 0d array containing the array, numpy will not try and stack the subarrays.

Numpy - the best way to remove the last element from 1 dimensional array?

What is the most efficient way to remove the last element from a numpy 1 dimensional array? (like pop for list)
NumPy arrays have a fixed size, so you cannot remove an element in-place. For example using del doesn't work:
>>> import numpy as np
>>> arr = np.arange(5)
>>> del arr[-1]
ValueError: cannot delete array elements
Note that the index -1 represents the last element. That's because negative indices in Python (and NumPy) are counted from the end, so -1 is the last, -2 is the one before last and -len is actually the first element. That's just for your information in case you didn't know.
Python lists are variable sized so it's easy to add or remove elements.
So if you want to remove an element you need to create a new array or view.
Creating a new view
You can create a new view containing all elements except the last one using the slice notation:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> arr[:-1] # all but the last element
array([0, 1, 2, 3])
>>> arr[:-2] # all but the last two elements
array([0, 1, 2])
>>> arr[1:] # all but the first element
array([1, 2, 3, 4])
>>> arr[1:-1] # all but the first and last element
array([1, 2, 3])
However a view shares the data with the original array, so if one is modified so is the other:
>>> sub = arr[:-1]
>>> sub
array([0, 1, 2, 3])
>>> sub[0] = 100
>>> sub
array([100, 1, 2, 3])
>>> arr
array([100, 1, 2, 3, 4])
Creating a new array
1. Copy the view
If you don't like this memory sharing you have to create a new array, in this case it's probably simplest to create a view and then copy (for example using the copy() method of arrays) it:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> sub_arr = arr[:-1].copy()
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
2. Using integer array indexing [docs]
However, you can also use integer array indexing to remove the last element and get a new array. This integer array indexing will always (not 100% sure there) create a copy and not a view:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> indices_to_keep = [0, 1, 2, 3]
>>> sub_arr = arr[indices_to_keep]
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
This integer array indexing can be useful to remove arbitrary elements from an array (which can be tricky or impossible when you want a view):
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> arr[[0, 1, 3, 4]] # keep first, second, fourth and fifth element
array([5, 6, 8, 9])
If you want a generalized function that removes the last element using integer array indexing:
def remove_last_element(arr):
return arr[np.arange(arr.size - 1)]
3. Using boolean array indexing [docs]
There is also boolean indexing that could be used, for example:
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> keep = [True, True, True, True, False]
>>> arr[keep]
array([5, 6, 7, 8])
This also creates a copy! And a generalized approach could look like this:
def remove_last_element(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
If you would like more information on NumPys indexing the documentation on "Indexing" is quite good and covers a lot of cases.
4. Using np.delete()
Normally I wouldn't recommend the NumPy functions that "seem" like they are modifying the array in-place (like np.append and np.insert) but do return copies because these are generally needlessly slow and misleading. You should avoid them whenever possible, that's why it's the last point in my answer. However in this case it's actually a perfect fit so I have to mention it:
>>> arr = np.arange(10, 20)
>>> arr
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> np.delete(arr, -1)
array([10, 11, 12, 13, 14, 15, 16, 17, 18])
5.) Using np.resize()
NumPy has another method that sounds like it does an in-place operation but it really returns a new array:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> np.resize(arr, arr.size - 1)
array([0, 1, 2, 3])
To remove the last element I simply provided a new shape that is 1 smaller than before, which effectively removes the last element.
Modifying the array inplace
Yes, I've written previously that you cannot modify an array in place. But I said that because in most cases it's not possible or only by disabling some (completely useful) safety checks. I'm not sure about the internals but depending on the old size and the new size it could be possible that this includes an (internal-only) copy operation so it might be slower than creating a view.
Using np.ndarray.resize()
If the array doesn't share its memory with any other array, then it's possible to resize the array in place:
>>> arr = np.arange(5, 10)
>>> arr.resize(4)
>>> arr
array([5, 6, 7, 8])
However that will throw ValueErrors in case it's actually referenced by another array as well:
>>> arr = np.arange(5)
>>> view = arr[1:]
>>> arr.resize(4)
ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function
You can disable that safety-check by setting refcheck=False but that shouldn't be done lightly because you make yourself vulnerable for segmentation faults and memory corruption in case the other reference tries to access the removed elements! This refcheck argument should be treated as an expert-only option!
Summary
Creating a view is really fast and doesn't take much additional memory, so whenever possible you should try to work as much with views as possible. However depending on the use-cases it's not so easy to remove arbitrary elements using basic slicing. While it's easy to remove the first n elements and/or last n elements or remove every x element (the step argument for slicing) this is all you can do with it.
But in your case of removing the last element of a one-dimensional array I would recommend:
arr[:-1] # if you want a view
arr[:-1].copy() # if you want a new array
because these most clearly express the intent and everyone with Python/NumPy experience will recognize that.
Timings
Based on the timing framework from this answer:
# Setup
import numpy as np
def view(arr):
return arr[:-1]
def array_copy_view(arr):
return arr[:-1].copy()
def array_int_index(arr):
return arr[np.arange(arr.size - 1)]
def array_bool_index(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
def array_delete(arr):
return np.delete(arr, -1)
def array_resize(arr):
return np.resize(arr, arr.size - 1)
# Timing setup
timings = {view: [],
array_copy_view: [], array_int_index: [], array_bool_index: [],
array_delete: [], array_resize: []}
sizes = [2**i for i in range(1, 20, 2)]
# Timing
for size in sizes:
print(size)
func_input = np.random.random(size=size)
for func in timings:
print(func.__name__.ljust(20), ' ', end='')
res = %timeit -o func(func_input) # if you use IPython, otherwise use the "timeit" module
timings[func].append(res)
# Plotting
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
ax = plt.subplot(111)
for func in timings:
ax.plot(sizes,
[time.best for time in timings[func]],
label=func.__name__)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()
I get the following timings as log-log plot to cover all the details, lower time still means faster, but the range between two ticks represents one order of magnitude instead of a fixed amount. In case you're interested in the specific values, I copied them into this gist:
According to these timings those two approaches are also the fastest. (Python 3.6 and NumPy 1.14.0)
If you want to quickly get array without last element (not removing explicit), use slicing:
array[:-1]
To delete the last element from a 1-dimensional NumPy array, use the numpy.delete method, like so:
import numpy as np
# Create a 1-dimensional NumPy array that holds 5 values
values = np.array([1, 2, 3, 4, 5])
# Remove the last element of the array using the numpy.delete method
values = np.delete(values, -1)
print(values)
Output:
[1 2 3 4]
The last value of the NumPy array, which was 5, is now removed.

Categories

Resources