Copying a list of numpy arrays - python

I am working with (lists of) lists of numpy arrays. As a bare bones example, consider this piece of code:
a = [np.zeros(5)]
b = a.copy()
b[0] += 1
Here, I copy a list of one array from a to b. However, the array itself is not copied, so:
print(a)
print(b)
both give [array([1., 1., 1., 1., 1.])]. If I want to make a copy of the array as well, I could do something like:
b = [arr.copy() for arr in a]
and a would remain unchanged. This works well for a simple list, but it becomes more complicated when working with nested lists of arrays where the number of arrays in each list is not always the same.
Is there a simple way to copy a multi-level list and every object that it contains without keeping references to the objects in the original list? Basically, I would like to avoid nested loops as well as dealing with the size of each individual sub-list.

What you are looking for is a deepcopy
import numpy as np
import copy
a = [np.zeros(5)]
b = copy.deepcopy(a)
b[0] += 1 # a[0] is not changed
This is actually method recommended in numpy doc for the deepcopy of object arrays.

You need to use deepcopy.
import numpy as np
import copy
a = [np.zeros(5)]
b = copy.deepcopy(a)
b[0] += 1
print(a)
print(b)
Result:
[array([0., 0., 0., 0., 0.])]
[array([1., 1., 1., 1., 1.])]

Related

Adding arrays which may contain 'None'-entries

I have a question regarding the addition of numpy arrays.
Let's assume I have defined a function
def foo(a,b):
return a+b
that takes two arrays of the same shape and simply returns their sum.
Now, I have to deal with the cases that some of the entries may be None.
I would like to deal with those entries as they correspond to float(0), such that
[1.0,None,2.0] + [1.0,2.0,2.0]
would add up to
[2.0,2.0,4.0]
Can you provide me with an already-implemented solution?
TIA
I suggest numpy.nan_to_num:
>>> np.nan_to_num(np.array([1.0,None,2.0], dtype=np.float))
array([ 1., 0., 2.])
Then,
>>> def foo(a,b):
... return np.nan_to_num(a) + np.nan_to_num(b)
...
>>> foo(np.array([1.0,None,2.0], dtype=np.float), np.array([1.0,2.0,2.0], dtype=np.float))
array([ 2., 0., 4.])
Usually, the answer to this is to use an array of floats, rather than an array of arbitrary objects, and then use np.nan instead of None. NaN has well-defined semantics for arithmetic. (Also, using an array of floats instead of objects will make your code significantly more time and space efficient.)
Notice that you don't have to manually convert None to np.nan if you build the array with an explicit dtype of float or np.float64. Both of these are equivalent:
>>> a = np.array([1.0,np.nan,2.0])
>>> a = np.array([1.0,None,2.0],dtype=float)
Which means that if, for some reason, you really needed arrays of arbitrary objects with actual None in them, you could do that, and then convert it to an array of floats on the fly to get the benefits of NaN:
>>> a.astype(float) + b.astype(float)
At any rate, in this case, just using NaN isn't sufficient:
>>> a = np.array([1.0,np.nan,2.0])
>>> b = np.array([1.0,2.0,2.0])
>>> a + b
array([ 2., nan, 4.])
That's because the semantics of NaN are that the result of any operation with NaN returns NaN. But you want to treat it as 0.
But it does make the problem easy to solve. The simplest way to solve that is with the function nan_to_num:
>>> np.nan_to_num(a, 0)
array([1., 0., 2.0])
>>> np.nan_to_num(a, 0) + np.nan_to_num(b, 0)
array([2., 2., 4.])
You can use column_stack to concatenates both arrays along the second axis then use np.nansum() to sum items over the second axis.
In [15]: a = np.array([1.0,None,2.0], dtype=np.float)
# Using dtype here is necessary to convert None to np.nan
In [16]: b = np.array([1.0,2.0,2.0])
In [17]: np.nansum(np.column_stack((a, b)), 1)
Out[17]: array([2., 2., 4.])

How does numpy array typing interact with object?

I am currently trying to implement a datatype that stores floats in an numpy array. However trying to assign an array with elements of this type with various lengths seems to obviously break the code. One would assign a sequence to an array element, which is not possible.
One can bypass this by using the data type object instead of float. Why is that? How could one resolve this problem using floats without creating a sequence?
Example code that does not work.
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3.]], dtype=foo)
Example code that does work:
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3., 2.]], dtype=foo)
Example code that does work, I try to replicate for float:
from numpy import *
foo= dtype(object, [])
x = array([[2., 3.], [3.]], dtype=foo)
The object dtype in Numpy simply creates an array of pointers to Python objects. This means you lose the performance advantage you usually get from Numpy, but it's still sometimes useful to do this.
Your last example creates a one-dimensional Numpy array of length two, so that's two pointers to Python objects. Both these objects happen to be lists, and Python list have arbitrary dynamic length.
I don't know what you were trying to achieve with this, but note that
>>> np.dtype(np.float32, []) == np.float32
True
Arrays require the same number of elements for each row. So, if you feed a list of lists in numpy and all sublists have the same number of elements, it'll happily convert it to an array. This is why your second example works.
If the sublists are not the same length, then each sublist is treated as a single object and you end up with a 1D array of objects. This is why your third example works. Your first example doesn't work because you try to cast a sequence of objects to floats, which isn't possible.
In short, you can't create an array of floats if your sublists are of different lengths. At best, you can create an array of 1D arrays, since they are still considered objects.
>>> x = np.array(list(map(np.array, [[2., 3.], [3.]])))
>>> x
array([array([ 2., 3.]), array([ 3.])], dtype=object)
>>> x[0]
array([ 2., 3.])
>>> x[0][1]
3.0
>>> # but you can't do this
>>> x[0,1]
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
x[0,1]
IndexError: too many indices for array
If you're bent on creating a float 2D array, you have to extend all your sublists to the same size with None, which will be converted to np.nan.
>>> lists = [[2., 3.], [3.]]
>>> max_len = max(map(len, lists))
>>> for i, sublist in enumerate(lists):
sublist = sublist + [None] * (max_len - len(sublist))
lists[i] = sublist
>>> np.array(lists, dtype=np.float32)
array([[ 2., 3.],
[ 3., nan]], dtype=float32)

How to find most occurred element in each row of a multidimentional array?

suppose I have an array:
import numpy as np
a=np.array([[1,0,0,1,0,0,0,1],[1,1,1,1,0,0,1,1]])
How can I compute another array b with most occurred values? ie.
b=([[0],[1]])
If you have scipy available, you could use scipy.stats.mode:
>>> a = np.array([[1,0,0,1,0,0,0,1],[1,1,1,1,0,0,1,1]])
>>> import scipy.stats
>>> most, mostcc = scipy.stats.mode(a, axis=1)
>>> most
array([[ 0.],
[ 1.]])
>>> mostcc
array([[ 5.],
[ 6.]])
Note, from the docs:
If there is more than one such value, only the first is returned.
The bin-count for the modal bins is also returned.
Just loop through each element of the array, in this case each element is [1,0,0,1,0,0,0,1] and [1,1,1,1,0,0,1,1].
for myarr in a:
Loop through each element of myarr:
for myele in myarr
Use a dict to keep track of the number of occurrences:
mydict[myele] += 1
Keep track of the highest value:
mymost = myele if mydict[myele] > mymost
After finishing up with myarr, append mymost to your result:
mymosts.append(mymost)
How about:
b = [np.bincount(x).argmax() for x in a]
or to get the format you were showing:
b = ([[np.bincount(x).argmax()] for x in a])
If you want this to work with floats using a Counter would be the best standard-library solution.
from collections import Counter
b = ([[Counter(x).most_common()[0][0]] for x in a])

Efficient way to drop a column from a Numpy array?

If I have a very large numpy array with one useless column, how could I drop it without creating a copy of the original array?
np.delete(my_np_array, 0, 1)
The above code will return a copy of the array without the zero-th column. But instead I would like to simply delete that column from my_np_array since I don't need it. For very large datasets, the memory management becomes important and copying may not be an option.
If memory is the main concern, what you can do is move columns around within your array such that the unneeded column gets at the very end of your array, then use ndarray.resize, which modifies he array in-place, to shrink it down and discard the outer column.
You cannot simply remove the first column of an array in-place using the provided API, and I suspect it is because of the memory layout of an ndarray that maps multidimensional indexing to unidimensional byte-oriented addressing within blocks of contiguous memory.
The following example copies the last column into the first and then deletes the last (now unneeded), immediately purging the associated memory. So it basically removes the obsolete column from memory completely, at the cost of changing your column order.
D1, D2 = A.shape
A[:, 0] = A[:, D2-1]
A.resize((D1, D2-1), refcheck=False)
A.shape
# => would be (5, 4) if the shape was initially (5, 5) for example
If you use slicing numpy won't make a copy; in other words
a = numpy.array([1, 2, 3, 4, 5])
b = a[1:] # view elements from second to last, NOT making a copy
b[0] = 12 # Change first element of `b`, i.e. second of `a`
print a
will reply [1, 12, 3, 4, 5]
If you need to delete an element in the middle however a single slicing won't work.
Numpy arrays are immutable. So they can't be re-sized without creating a intermediate copy.
How to remove specific elements in a numpy array
Creating a view with slicing, and make a copy of that is probably the fastest you can do.
In [804]: a = np.ones((2,2))
In [805]: a
Out[805]:
array([[ 1., 1.],
[ 1., 1.]])
In [806]: np.resize(a,(3,2))
Out[806]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
In [807]: a <- a should now be resized if it was done inplace?
Out[807]:
array([[ 1., 1.],
[ 1., 1.]])

How to store data into a view in numpy?

How do I store data into a numpy view without changing the view into a copy? This code snippet examplifies my question:
>>> import numpy as np
>>> #-- init arrays and view
>>> a = np.ones([4])
>>> z = np.zeros([2,4])
>>> z0 = z[0,:] #-- view
>>> z0.flags.owndata
False
>>> #-- This works!
>>> #-- modify view in-place
>>> np.add(a,z0,z0)
>>> z0.flags.owndata
False
>>> z
array([[ 1., 1., 1., 1.],
[ 0., 0., 0., 0.]])
>>> #-- reinit arrays and view
>>> z = np.zeros([2,4])
>>> z0 = z[0,:] #-- view
>>> #-- This does NOT work!
>>> #-- store data into view
>>> z0 = a
>>> z0.flags.owndata
True
I know about in-place modifications using += -= *= /= and numpy functions that take an out parameter, so you can do things like np.abs(x, x) to take the absolute value of x in-place.
But how to just store data into a view without modification?
Abusing the add function (to add zero and store) works but doesn't feel 'right':
np.add(a,0,z0)
When you do z0 = a, then z0 is the same object as a by python logic. What you want to do is this:
z0[...] = a
using the slicing syntax. Which uses the in-place __setitem__ python logic. On numpy 1.7. or later you could use np.copyto as well, which is probably a little faster, but I like the slicing syntax personally.

Categories

Resources