How to store data into a view in numpy? - python

How do I store data into a numpy view without changing the view into a copy? This code snippet examplifies my question:
>>> import numpy as np
>>> #-- init arrays and view
>>> a = np.ones([4])
>>> z = np.zeros([2,4])
>>> z0 = z[0,:] #-- view
>>> z0.flags.owndata
False
>>> #-- This works!
>>> #-- modify view in-place
>>> np.add(a,z0,z0)
>>> z0.flags.owndata
False
>>> z
array([[ 1., 1., 1., 1.],
[ 0., 0., 0., 0.]])
>>> #-- reinit arrays and view
>>> z = np.zeros([2,4])
>>> z0 = z[0,:] #-- view
>>> #-- This does NOT work!
>>> #-- store data into view
>>> z0 = a
>>> z0.flags.owndata
True
I know about in-place modifications using += -= *= /= and numpy functions that take an out parameter, so you can do things like np.abs(x, x) to take the absolute value of x in-place.
But how to just store data into a view without modification?
Abusing the add function (to add zero and store) works but doesn't feel 'right':
np.add(a,0,z0)

When you do z0 = a, then z0 is the same object as a by python logic. What you want to do is this:
z0[...] = a
using the slicing syntax. Which uses the in-place __setitem__ python logic. On numpy 1.7. or later you could use np.copyto as well, which is probably a little faster, but I like the slicing syntax personally.

Related

Copying a list of numpy arrays

I am working with (lists of) lists of numpy arrays. As a bare bones example, consider this piece of code:
a = [np.zeros(5)]
b = a.copy()
b[0] += 1
Here, I copy a list of one array from a to b. However, the array itself is not copied, so:
print(a)
print(b)
both give [array([1., 1., 1., 1., 1.])]. If I want to make a copy of the array as well, I could do something like:
b = [arr.copy() for arr in a]
and a would remain unchanged. This works well for a simple list, but it becomes more complicated when working with nested lists of arrays where the number of arrays in each list is not always the same.
Is there a simple way to copy a multi-level list and every object that it contains without keeping references to the objects in the original list? Basically, I would like to avoid nested loops as well as dealing with the size of each individual sub-list.
What you are looking for is a deepcopy
import numpy as np
import copy
a = [np.zeros(5)]
b = copy.deepcopy(a)
b[0] += 1 # a[0] is not changed
This is actually method recommended in numpy doc for the deepcopy of object arrays.
You need to use deepcopy.
import numpy as np
import copy
a = [np.zeros(5)]
b = copy.deepcopy(a)
b[0] += 1
print(a)
print(b)
Result:
[array([0., 0., 0., 0., 0.])]
[array([1., 1., 1., 1., 1.])]

Adding arrays which may contain 'None'-entries

I have a question regarding the addition of numpy arrays.
Let's assume I have defined a function
def foo(a,b):
return a+b
that takes two arrays of the same shape and simply returns their sum.
Now, I have to deal with the cases that some of the entries may be None.
I would like to deal with those entries as they correspond to float(0), such that
[1.0,None,2.0] + [1.0,2.0,2.0]
would add up to
[2.0,2.0,4.0]
Can you provide me with an already-implemented solution?
TIA
I suggest numpy.nan_to_num:
>>> np.nan_to_num(np.array([1.0,None,2.0], dtype=np.float))
array([ 1., 0., 2.])
Then,
>>> def foo(a,b):
... return np.nan_to_num(a) + np.nan_to_num(b)
...
>>> foo(np.array([1.0,None,2.0], dtype=np.float), np.array([1.0,2.0,2.0], dtype=np.float))
array([ 2., 0., 4.])
Usually, the answer to this is to use an array of floats, rather than an array of arbitrary objects, and then use np.nan instead of None. NaN has well-defined semantics for arithmetic. (Also, using an array of floats instead of objects will make your code significantly more time and space efficient.)
Notice that you don't have to manually convert None to np.nan if you build the array with an explicit dtype of float or np.float64. Both of these are equivalent:
>>> a = np.array([1.0,np.nan,2.0])
>>> a = np.array([1.0,None,2.0],dtype=float)
Which means that if, for some reason, you really needed arrays of arbitrary objects with actual None in them, you could do that, and then convert it to an array of floats on the fly to get the benefits of NaN:
>>> a.astype(float) + b.astype(float)
At any rate, in this case, just using NaN isn't sufficient:
>>> a = np.array([1.0,np.nan,2.0])
>>> b = np.array([1.0,2.0,2.0])
>>> a + b
array([ 2., nan, 4.])
That's because the semantics of NaN are that the result of any operation with NaN returns NaN. But you want to treat it as 0.
But it does make the problem easy to solve. The simplest way to solve that is with the function nan_to_num:
>>> np.nan_to_num(a, 0)
array([1., 0., 2.0])
>>> np.nan_to_num(a, 0) + np.nan_to_num(b, 0)
array([2., 2., 4.])
You can use column_stack to concatenates both arrays along the second axis then use np.nansum() to sum items over the second axis.
In [15]: a = np.array([1.0,None,2.0], dtype=np.float)
# Using dtype here is necessary to convert None to np.nan
In [16]: b = np.array([1.0,2.0,2.0])
In [17]: np.nansum(np.column_stack((a, b)), 1)
Out[17]: array([2., 2., 4.])

How does numpy array typing interact with object?

I am currently trying to implement a datatype that stores floats in an numpy array. However trying to assign an array with elements of this type with various lengths seems to obviously break the code. One would assign a sequence to an array element, which is not possible.
One can bypass this by using the data type object instead of float. Why is that? How could one resolve this problem using floats without creating a sequence?
Example code that does not work.
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3.]], dtype=foo)
Example code that does work:
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3., 2.]], dtype=foo)
Example code that does work, I try to replicate for float:
from numpy import *
foo= dtype(object, [])
x = array([[2., 3.], [3.]], dtype=foo)
The object dtype in Numpy simply creates an array of pointers to Python objects. This means you lose the performance advantage you usually get from Numpy, but it's still sometimes useful to do this.
Your last example creates a one-dimensional Numpy array of length two, so that's two pointers to Python objects. Both these objects happen to be lists, and Python list have arbitrary dynamic length.
I don't know what you were trying to achieve with this, but note that
>>> np.dtype(np.float32, []) == np.float32
True
Arrays require the same number of elements for each row. So, if you feed a list of lists in numpy and all sublists have the same number of elements, it'll happily convert it to an array. This is why your second example works.
If the sublists are not the same length, then each sublist is treated as a single object and you end up with a 1D array of objects. This is why your third example works. Your first example doesn't work because you try to cast a sequence of objects to floats, which isn't possible.
In short, you can't create an array of floats if your sublists are of different lengths. At best, you can create an array of 1D arrays, since they are still considered objects.
>>> x = np.array(list(map(np.array, [[2., 3.], [3.]])))
>>> x
array([array([ 2., 3.]), array([ 3.])], dtype=object)
>>> x[0]
array([ 2., 3.])
>>> x[0][1]
3.0
>>> # but you can't do this
>>> x[0,1]
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
x[0,1]
IndexError: too many indices for array
If you're bent on creating a float 2D array, you have to extend all your sublists to the same size with None, which will be converted to np.nan.
>>> lists = [[2., 3.], [3.]]
>>> max_len = max(map(len, lists))
>>> for i, sublist in enumerate(lists):
sublist = sublist + [None] * (max_len - len(sublist))
lists[i] = sublist
>>> np.array(lists, dtype=np.float32)
array([[ 2., 3.],
[ 3., nan]], dtype=float32)

How to find most occurred element in each row of a multidimentional array?

suppose I have an array:
import numpy as np
a=np.array([[1,0,0,1,0,0,0,1],[1,1,1,1,0,0,1,1]])
How can I compute another array b with most occurred values? ie.
b=([[0],[1]])
If you have scipy available, you could use scipy.stats.mode:
>>> a = np.array([[1,0,0,1,0,0,0,1],[1,1,1,1,0,0,1,1]])
>>> import scipy.stats
>>> most, mostcc = scipy.stats.mode(a, axis=1)
>>> most
array([[ 0.],
[ 1.]])
>>> mostcc
array([[ 5.],
[ 6.]])
Note, from the docs:
If there is more than one such value, only the first is returned.
The bin-count for the modal bins is also returned.
Just loop through each element of the array, in this case each element is [1,0,0,1,0,0,0,1] and [1,1,1,1,0,0,1,1].
for myarr in a:
Loop through each element of myarr:
for myele in myarr
Use a dict to keep track of the number of occurrences:
mydict[myele] += 1
Keep track of the highest value:
mymost = myele if mydict[myele] > mymost
After finishing up with myarr, append mymost to your result:
mymosts.append(mymost)
How about:
b = [np.bincount(x).argmax() for x in a]
or to get the format you were showing:
b = ([[np.bincount(x).argmax()] for x in a])
If you want this to work with floats using a Counter would be the best standard-library solution.
from collections import Counter
b = ([[Counter(x).most_common()[0][0]] for x in a])

Numpy, a 2 rows 1 column file, loadtxt() returns 1row 2 columns

2.765334406984874427e+00
3.309563282821381680e+00
The file looks like above: 2 rows, 1 col
numpy.loadtxt() returns
[ 2.76533441 3.30956328]
Please don't tell me use array.transpose() in this case, I need a real solution. Thank you in advance!!
You can always use the reshape command. A single column text file loads as a 1D array which in numpy's case is a row vector.
>>> a
array([ 2.76533441, 3.30956328])
>>> a[:,None]
array([[ 2.76533441],
[ 3.30956328]])
>>> b=np.arange(5)[:,None]
>>> b
array([[0],
[1],
[2],
[3],
[4]])
>>> np.savetxt('something.npz',b)
>>> np.loadtxt('something.npz')
array([ 0., 1., 2., 3., 4.])
>>> np.loadtxt('something.npz').reshape(-1,1) #Another way of doing it
array([[ 0.],
[ 1.],
[ 2.],
[ 3.],
[ 4.]])
You can check this using the number of dimensions.
data=np.loadtxt('data.npz')
if data.ndim==1: data=data[:,None]
Or
np.loadtxt('something.npz',ndmin=2) #Always gives at at least a 2D array.
Although its worth pointing out that if you always have a column of data numpy will always load it as a 1D array. This is more of a feature of numpy arrays rather then a bug I believe.
If you like, you can use matrix to read from string. Let test.txt involve the content. Here's a function for your needs:
import numpy as np
def my_loadtxt(filename):
return np.array(np.matrix(open(filename).read().strip().replace('\n', ';')))
a = my_loadtxt('test.txt')
print a
It gives column vectors if the input is a column vector. For the row vectors, it gives row vectors.
You might want to use the csv module:
import csv
import numpy as np
reader = csv.reader( open('file.txt') )
l = list(reader)
a = np.array(l)
a.shape
>>> (2,1)
This way, you will get the correct array dimensions irrespective of the number of rows / columns present in the file.
I've written a wrapper for loadtxt to do this and is similar to answer from #petrichor, but I think matrix can't have a string data format (probably understandably) so and that method doesn't seem to work if you're loading strings (such as column headings).
def my_loadtxt(filename, skiprows=0, usecols=None, dtype=None):
d = np.loadtxt(filename, skiprows=skiprows, usecols=usecols, dtype=dtype, unpack=True)
if len(d.shape) == 0:
d = d.reshape((1, 1))
elif len(d.shape) == 1:
d = d.reshape((d.shape[0], 1))
return d

Categories

Resources