Numpy delete rows of array inside object array - python

I am trying to delete rows from arrays which are stored inside an object array in numpy. However as you can see it complains that it cannot broadcast the smaller array into the larger array. Works fine when done directly to the array. What is the issue here? Any clean way around this error other than making a new object array and copying one by one until the array I want to modify?
In [1]: import numpy as np
In [2]: x = np.zeros((3, 2))
In [3]: x = np.delete(x, 1, axis=0)
In [4]: x
Out[4]:
array([[ 0., 0.],
[ 0., 0.]])
In [5]: x = np.array([np.zeros((3, 2)), np.zeros((3, 2))], dtype=object)
In [6]: x[0] = np.delete(x[0], 1, axis=0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-1687d284d03c> in <module>()
----> 1 x[0] = np.delete(x[0], 1, axis=0)
ValueError: could not broadcast input array from shape (2,2) into shape (3,2)
Edit: Apparently it works when arrays are different shape. This is quite annoying. Any way to disable automatic concatenation by np.array?
In [12]: x = np.array([np.zeros((3, 2)), np.zeros((5, 8))], dtype=object)
In [13]: x[0] = np.delete(x[0], 1, axis=0)
In [14]: x = np.array([np.zeros((3, 2)), np.zeros((3, 2))], dtype=object)
In [15]: x.shape
Out[15]: (2, 3, 2)
In [16]: x = np.array([np.zeros((3, 2)), np.zeros((5, 8))], dtype=object)
In [17]: x.shape
Out[17]: (2,)
This is some quite inconsistent behaviour.

The fact that np.array creates as high a dimensional array as it can has been discussed many times on SO. If the elements are different in size it will keep them separate, or in some cases raise an error.
In your example
In [201]: x = np.array([np.zeros((3, 2)), np.zeros((3, 2))], dtype=object)
In [202]: x
Out[202]:
array([[[0.0, 0.0],
[0.0, 0.0],
[0.0, 0.0]],
[[0.0, 0.0],
[0.0, 0.0],
[0.0, 0.0]]], dtype=object)
The safe way to make an object array of a determined size is to initialize it and then fill it:
In [203]: x=np.empty(2, dtype=object)
In [204]: x
Out[204]: array([None, None], dtype=object)
In [205]: x[...] = [np.zeros((3, 2)), np.zeros((3, 2))]
In [206]: x
Out[206]:
array([array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]),
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])], dtype=object)
A 1d object array like this, is, for most practical purposes a list. Operations on the elements are performed with Python level iteration, implicit or explicit (as with your list comprehension). Most of the computational power of a multidimensional numeric array is gone.
In [207]: x.shape
Out[207]: (2,)
In [208]: [xx.shape for xx in x] # shape of the elements
Out[208]: [(3, 2), (3, 2)]
In [209]: [xx[:2,:] for xx in x] # slice the elements
Out[209]:
[array([[ 0., 0.],
[ 0., 0.]]), array([[ 0., 0.],
[ 0., 0.]])]
You can reshape such an array, but you can't append as if it were a list. Some math operations cross the 'object' boundary, but it is hit-and-miss. In sum, don't use object arrays when a list would work just as well.
Understanding non-homogeneous numpy arrays

This is very hacky and ugly in my opinion but it's the only solution I could think of. Use list comprehension to convert object array to list (using .tolist() does not work as it breaks the subarrays into lists), modifying the element and converting back to object array.
In [37]: x = np.array([np.zeros((3, 2)), np.zeros((3, 2))], dtype=object)
In [38]: xx = [z for z in x]
In [39]: xx[0] = np.delete(xx[0], 1, axis=0)
In [40]: x = np.array(xx, dtype=object)
In [41]: x
Out[41]:
array([array([[0.0, 0.0],
[0.0, 0.0]], dtype=object),
array([[0.0, 0.0],
[0.0, 0.0],
[0.0, 0.0]], dtype=object)], dtype=object)
I think I'll post an issue on the numpy github for consistent behavior of object arrays

Related

Error when trying to give values to matrix, shows problem with dimensions

I made a numpy array of size 6*6*51 with the command
matrix = np.zeros((6,6,51))
But when I write a loop so that a 6 element array is placed in matrix[i][:][j] it says the
ValueError: could not broadcast input array from shape (6) into shape
(51)
What am I doing wrong here ?
import numpy as np
a = np.zeros((6,6,51))
You are indexing incorrectly. Your subscription should include all the dimensions.
a[i][:] is essentially the same as a[i] then j in a[i][:][j] is indexing the second dimension.
In [14]: b = a[1][:]
In [15]: b.shape
Out[15]: (6, 51)
In [16]: c = a[1][:][1]
In [17]: c.shape
Out[17]: (51,)
Use a[i,:,j] for the subscription.
In [20]: a[1,:,1] = np.arange(6)
In [21]: a[1,:,1]
Out[21]: array([ 0., 1., 2., 3., 4., 5.])
Numpy Indexing

How to convert 2d np.array of lists of floats into a 2d np.array of floats, stacking the list values to rows

I have a huge 2d numpy array of lists (dtype object) that I want to convert into a 2d numpy array of dtype float, stacking the dimension represented by lists onto the 0th axis (rows). The lists within each row always have the exact same length, and have at least one element.
Here is a minimal reproduction of the situation:
import numpy as np
current_array = np.array(
[[[0.0], [1.0]],
[[2.0, 3.0], [4.0, 5.0]]]
)
desired_array = np.array(
[[0.0, 1.0],
[2.0, 4.0],
[3.0, 5.0]]
)
I looked around for solutions, and stack and dstack functions work only if the first level is a tuple. reshape would require the third level to be a part of the array. I wonder, is there any relatively efficient way to do it?
Currently, I am just counting the dimensions, creating empty array and filling the values one by one, which honestly does not seem like a good solution.
In [321]: current_array = np.array(
...: [[[0.0], [1.0]],
...: [[2.0, 3.0], [4.0, 5.0]]]
...: )
In [322]: current_array
Out[322]:
array([[list([0.0]), list([1.0])],
[list([2.0, 3.0]), list([4.0, 5.0])]], dtype=object)
In [323]: _.shape
Out[323]: (2, 2)
Rework the two rows:
In [328]: current_array[1,:]
Out[328]: array([list([2.0, 3.0]), list([4.0, 5.0])], dtype=object)
In [329]: np.stack(current_array[1,:],1)
Out[329]:
array([[2., 4.],
[3., 5.]])
In [330]: np.stack(current_array[0,:],1)
Out[330]: array([[0., 1.]])
combine them:
In [331]: np.vstack((_330, _329))
Out[331]:
array([[0., 1.],
[2., 4.],
[3., 5.]])
in one line:
In [333]: np.vstack([np.stack(row, 1) for row in current_array])
Out[333]:
array([[0., 1.],
[2., 4.],
[3., 5.]])
Author of the question here.
I found a slightly more elegant (and faster) way than filling the array one by one, which is:
desired = np.array([np.concatenate([np.array(d) for d in lis]) for lis in current.T]).T
print(desired)
'''
[[0. 1.]
[2. 4.]
[3. 5.]]
'''
But it still does quite the number of operations. It transposes the table to be able to stack the neighboring 'dimensions' (one of them is the lists) with np.concatenate, and then converts the result to np.array and transposes it back.

Numpy add with multiple arrays

Is there a way to add (as opposed to sum) multiple arrays together in a single operation? Obviously, np.sum and np.add are different operations, however, the problem I'm struggling with right now is that np.add only takes two arrays at once. I could utilize either
output = 0
for arr in arr_list:
output = output + array
or
output = 0
for arr in arr_list:
output = np.add(output, array)
and, yes, this is workable. However, it would be nice if I could simply do some variant of
output = np.add_multiple(arr_list)
Does this exist?
EDIT:
I failed to be clear initially. I specifically require a function that does not require an array of arrays to be constructed, as my arrays are not of equal dimensions and require broadcasting, for example:
a = np.arange(3).reshape(1,3)
b = np.arange(9).reshape(3,3)
a, b = a[:,:,None,None], b[None,None,:,:]
These work:
a + b # Works
np.add(a, b) # Works
These do not, and fail with the same exception:
np.sum([a, b], axis = 0)
np.add.reduce([a, b])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (3,1,1) into shape (1)
You can just use the Python's sum built-in:
output = sum(arr_list)
For many other numpy functions, np.<ufunc>.reduce can be used as shown by #hpaulj.
You can use the sum() to add multiple arrays.
arr = np.array([[6,2,3,5,4,3],
[7,7,2,4,6,7],
[10,6,2,4,5,9]])
np.add(0, arr.sum(axis=0))
In [18]: alist = [np.arange(4),np.ones(4),np.array([10,1,11,2])]
In [19]: np.add.reduce(alist)
Out[19]: array([11., 3., 14., 6.])
In [20]: alist[0]+alist[1]+alist[2]
Out[20]: array([11., 3., 14., 6.])
And for more fun:
In [21]: np.add.accumulate(alist)
Out[21]:
array([[ 0., 1., 2., 3.],
[ 1., 2., 3., 4.],
[11., 3., 14., 6.]])
edit
In [53]: a.shape
Out[53]: (1, 1, 1, 3)
In [54]: b.shape
Out[54]: (3, 3, 1, 1)
Addition with broadcasting:
In [63]: sum([a,b]).shape
Out[63]: (3, 3, 1, 3)
In [64]: (a+b).shape
Out[64]: (3, 3, 1, 3)
In [66]: np.add.reduce([a,b]).shape
Out[66]: (3, 3, 1, 3)
For what it's worth, I was suggesting add.reduce because I thought you wanted to add more than 2 arrays.
All these work as long as the arrays broadcast together.

Unsuccessful append to an empty NumPy array

I am trying to fill an empty(not np.empty!) array with values using append but I am gettin error:
My code is as follows:
import numpy as np
result=np.asarray([np.asarray([]),np.asarray([])])
result[0]=np.append([result[0]],[1,2])
And I am getting:
ValueError: could not broadcast input array from shape (2) into shape (0)
I might understand the question incorrectly, but if you want to declare an array of a certain shape but with nothing inside, the following might be helpful:
Initialise empty array:
>>> a = np.zeros((0,3)) #or np.empty((0,3)) or np.array([]).reshape(0,3)
>>> a
array([], shape=(0, 3), dtype=float64)
Now you can use this array to append rows of similar shape to it. Remember that a numpy array is immutable, so a new array is created for each iteration:
>>> for i in range(3):
... a = np.vstack([a, [i,i,i]])
...
>>> a
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
np.vstack and np.hstack is the most common method for combining numpy arrays, but coming from Matlab I prefer np.r_ and np.c_:
Concatenate 1d:
>>> a = np.zeros(0)
>>> for i in range(3):
... a = np.r_[a, [i, i, i]]
...
>>> a
array([ 0., 0., 0., 1., 1., 1., 2., 2., 2.])
Concatenate rows:
>>> a = np.zeros((0,3))
>>> for i in range(3):
... a = np.r_[a, [[i,i,i]]]
...
>>> a
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
Concatenate columns:
>>> a = np.zeros((3,0))
>>> for i in range(3):
... a = np.c_[a, [[i],[i],[i]]]
...
>>> a
array([[ 0., 1., 2.],
[ 0., 1., 2.],
[ 0., 1., 2.]])
numpy.append is pretty different from list.append in python. I know that's thrown off a few programers new to numpy. numpy.append is more like concatenate, it makes a new array and fills it with the values from the old array and the new value(s) to be appended. For example:
import numpy
old = numpy.array([1, 2, 3, 4])
new = numpy.append(old, 5)
print old
# [1, 2, 3, 4]
print new
# [1, 2, 3, 4, 5]
new = numpy.append(new, [6, 7])
print new
# [1, 2, 3, 4, 5, 6, 7]
I think you might be able to achieve your goal by doing something like:
result = numpy.zeros((10,))
result[0:2] = [1, 2]
# Or
result = numpy.zeros((10, 2))
result[0, :] = [1, 2]
Update:
If you need to create a numpy array using loop, and you don't know ahead of time what the final size of the array will be, you can do something like:
import numpy as np
a = np.array([0., 1.])
b = np.array([2., 3.])
temp = []
while True:
rnd = random.randint(0, 100)
if rnd > 50:
temp.append(a)
else:
temp.append(b)
if rnd == 0:
break
result = np.array(temp)
In my example result will be an (N, 2) array, where N is the number of times the loop ran, but obviously you can adjust it to your needs.
new update
The error you're seeing has nothing to do with types, it has to do with the shape of the numpy arrays you're trying to concatenate. If you do np.append(a, b) the shapes of a and b need to match. If you append an (2, n) and (n,) you'll get a (3, n) array. Your code is trying to append a (1, 0) to a (2,). Those shapes don't match so you get an error.
This error arise from the fact that you are trying to define an object of shape (0,) as an object of shape (2,). If you append what you want without forcing it to be equal to result[0] there is no any issue:
b = np.append([result[0]], [1,2])
But when you define result[0] = b you are equating objects of different shapes, and you can not do this. What are you trying to do?
Here's the result of running your code in Ipython. Note that result is a (2,0) array, 2 rows, 0 columns, 0 elements. The append produces a (2,) array. result[0] is (0,) array. Your error message has to do with trying to assign that 2 item array into a size 0 slot. Since result is dtype=float64, only scalars can be assigned to its elements.
In [65]: result=np.asarray([np.asarray([]),np.asarray([])])
In [66]: result
Out[66]: array([], shape=(2, 0), dtype=float64)
In [67]: result[0]
Out[67]: array([], dtype=float64)
In [68]: np.append(result[0],[1,2])
Out[68]: array([ 1., 2.])
np.array is not a Python list. All elements of an array are the same type (as specified by the dtype). Notice also that result is not an array of arrays.
Result could also have been built as
ll = [[],[]]
result = np.array(ll)
while
ll[0] = [1,2]
# ll = [[1,2],[]]
the same is not true for result.
np.zeros((2,0)) also produces your result.
Actually there's another quirk to result.
result[0] = 1
does not change the values of result. It accepts the assignment, but since it has 0 columns, there is no place to put the 1. This assignment would work in result was created as np.zeros((2,1)). But that still can't accept a list.
But if result has 2 columns, then you can assign a 2 element list to one of its rows.
result = np.zeros((2,2))
result[0] # == [0,0]
result[0] = [1,2]
What exactly do you want result to look like after the append operation?
numpy.append always copies the array before appending the new values. Your code is equivalent to the following:
import numpy as np
result = np.zeros((2,0))
new_result = np.append([result[0]],[1,2])
result[0] = new_result # ERROR: has shape (2,0), new_result has shape (2,)
Perhaps you mean to do this?
import numpy as np
result = np.zeros((2,0))
result = np.append([result[0]],[1,2])
SO thread 'Multiply two arrays element wise, where one of the arrays has arrays as elements' has an example of constructing an array from arrays. If the subarrays are the same size, numpy makes a 2d array. But if they differ in length, it makes an array with dtype=object, and the subarrays retain their identity.
Following that, you could do something like this:
In [5]: result=np.array([np.zeros((1)),np.zeros((2))])
In [6]: result
Out[6]: array([array([ 0.]), array([ 0., 0.])], dtype=object)
In [7]: np.append([result[0]],[1,2])
Out[7]: array([ 0., 1., 2.])
In [8]: result[0]
Out[8]: array([ 0.])
In [9]: result[0]=np.append([result[0]],[1,2])
In [10]: result
Out[10]: array([array([ 0., 1., 2.]), array([ 0., 0.])], dtype=object)
However, I don't offhand see what advantages this has over a pure Python list or lists. It does not work like a 2d array. For example I have to use result[0][1], not result[0,1]. If the subarrays are all the same length, I have to use np.array(result.tolist()) to produce a 2d array.

Good ways to "expand" a numpy ndarray?

Are there good ways to "expand" a numpy ndarray? Say I have an ndarray like this:
[[1 2]
[3 4]]
And I want each row to contains more elements by filling zeros:
[[1 2 0 0 0]
[3 4 0 0 0]]
I know there must be some brute-force ways to do so (say construct a bigger array with zeros then copy elements from old smaller arrays), just wondering are there pythonic ways to do so. Tried numpy.reshape but didn't work:
import numpy as np
a = np.array([[1, 2], [3, 4]])
np.reshape(a, (2, 5))
Numpy complains that: ValueError: total size of new array must be unchanged
You can use numpy.pad, as follows:
>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
Here np.pad says, "Take the array a and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant specified by constant_values".
There are the index tricks r_ and c_.
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
If this is performance critical code, you might prefer to use the equivalent np.concatenate rather than the index tricks.
>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
There are also np.resize and np.ndarray.resize, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.
By the way, when I've needed to do this I usually just do it the basic way you've already mentioned (create an array of zeros and assign the smaller array inside it), I don't see anything wrong with that!
Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.
A lot of functions are available for convenience (the np.concatenate function and its np.*stack shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_...), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate and others, I think), some are not.
Note that there's nothing at all with your initial suggestion of creating a large array 'by hand' (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.
A simple way:
# what you want to expand
x = np.ones((3, 3))
# expand to what shape
target = np.zeros((6, 6))
# do expand
target[:x.shape[0], :x.shape[1]] = x
# print target
array([[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
Functional way:
borrow from https://stackoverflow.com/a/35751427/1637673, with a little modification.
def pad(array, reference_shape, offsets=None):
"""
array: Array to be padded
reference_shape: tuple of size of narray to create
offsets: list of offsets (number of elements must be equal to the dimension of the array)
will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
"""
if not offsets:
offsets = np.zeros(array.ndim, dtype=np.int32)
# Create an array of zeros with the reference shape
result = np.zeros(reference_shape, dtype=np.float32)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = array
return result
You should use np.column_stack or append
import numpy as np
p = np.array([ [1,2] , [3,4] ])
p = np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
p
Out[277]:
array([[1, 2, 0, 0],
[3, 4, 0, 0]])
Append seems to be faster though:
timeit np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
10000 loops, best of 3: 61.8 us per loop
timeit np.append(p, [[0,0],[0,0]],1)
10000 loops, best of 3: 48 us per loop
And a comparison with np.c_ and np.hstack [append still seems to be the fastest]:
In [295]: z=np.zeros((2, 2), dtype=a.dtype)
In [296]: timeit np.c_[a, z]
10000 loops, best of 3: 47.2 us per loop
In [297]: timeit np.append(p, z,1)
100000 loops, best of 3: 13.1 us per loop
In [305]: timeit np.hstack((p,z))
10000 loops, best of 3: 20.8 us per loop
and np.concatenate [that is a even a bit faster than append]:
In [307]: timeit np.concatenate((p, z), axis=1)
100000 loops, best of 3: 11.6 us per loop
there are also similar methods like np.vstack, np.hstack, np.dstack. I like these over np.concatente as it makes it clear what dimension is being "expanded".
temp = np.array([[1, 2], [3, 4]])
np.hstack((temp, np.zeros((2,3))))
it's easy to remember becase numpy's first axis is vertical so vstack expands the first axis and 2nd axis is horizontal so hstack.

Categories

Resources