I have been trying to copy the individual elements from one 2D array to another. My code is as follows:
tp_matrix = np.array(tp_matrix)
my_array = np.empty(shape = (tp_matrix.shape))
for x in range(tp_matrix.shape[0]):
for y in range(tp_matrix.shape[1]):
my_array[x][y] = tp_matrix[x][y]
if(np.array_equal(my_array, tp_matrix)):
print('Equal')
else:
print('Not equal')
However the two arrays are not equal for some reason. What is the problem here and what can I do to solve it?
I cannot use numpy's copy function as I want to make modifications later to some of the elements from my_array later with the other values being the same as that of my_matrix.
Edit: On running the code I get the following message:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
Does this mean there is something wrong with the dataset (tp_matrix)?
Edit 2: I have tried the allclose and isclose functions but I get this error:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The data is stored as floats. Also it is a bit large (399 x 5825).
Edit 3: Solved. I had to reinstall python.
Use np.allclose to test the (almost) equality of float arrays, because of the way float numbers are represented in a computer.
For more details, you could read for instance "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
I tried to mimic what you are experiencing and did the following:
one = np.array([1,2,3])
two = np.array([one,one,one])
three = np.empty(shape=(two.shape))
for x in range(two.shape[0]):
for y in range(two.shape[1]):
three[x][y] = two[x][y]
Printing the contents of 'two' and 'three' gives the following result
print(three)
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
print(two)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Although for this small example numpy returns True if I test equality using np.array_equal, it is possible that rounding errors cause the test to be False in your case.
A workaround for this could be the following test:
sum(sum(two==three)) == two.shape[0]*three.shape[1]
Although there are probably more efficient ways.
Related
As far as I understand the documentation of numpy's negative function, its where option allows you to leave some array components unnegated:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1., 2.],
[-3., 4.],
[-5., 6.]])
However, when I try it, it seems that those values are (almost) zeroed instead:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1.00000000e+000, 6.92885436e-310],
[-3.00000000e+000, 6.92885377e-310],
[-5.00000000e+000, 6.92885375e-310]])
So how should I see the where option?
The documentation describes where like this:
Values of True indicate to calculate the ufunc at that position, values of False indicate to leave the value in the output alone.
Let's try an example using the out parameter:
x = np.ones(3)
np.negative(np.array([4.,5.,6.]), where=np.array([False,True,False]), out=x)
This sets x to [1., -5., 1.], and returns the same.
This makes some amount of sense once you realize that "leave the value in the output alone" literally means the output value is "don't care", rather than "same as the input" (the latter interpretation was how I read it the first time, too).
The problem comes in when you specify where but not out. Apparently the "ufunc machinery" (which is not visible in the implementation of np.negative()) creates an empty output array, meaning the values are indeterminate. So the locations at which where is False will have uninitialized values, which could be anything.
This seems pretty wrong to me, but there was a NumPy issue filed about it last year, and closed. It seems unlikely to change, so you'll have to work around it (e.g. by creating the output array yourself using zeros).
Are there any sources or guidelines for safe, bug-free numerical programming with numpy?
I'm asking because I've painfully learned that numpy does many things that seem to really ask for bugs to happen, such as...
Adding matrices of different sizes ("broadcasting") without complaining:
In: np.array([1]) + np.identity(2)
Out: array([[ 2., 1.],
[ 1., 2.]])
Returning different data types depending on the input:
In: scalar1 = 1
In: scalar2 = 1.
In: np.array(scalar1).dtype
Out: dtype('int32')
In: np.array(scalar2).dtype
Out: dtype('float64')
Or simply not performing a desired operation (again, depending on the data type) without raising any warnings:
In: np.squeeze(np.array([[1, 1]])).ndim
Out: 1
In: np.squeeze(np.matrix([[1, 1]])).ndim
Out: 2
These are all very hard to discover bugs, since they do not raise any exceptions or warnings and often return results of the valid data types / shapes. Therefore my question: Are there any general guidelines for improving the safety and preventing bugs in mathematical programming with numpy?
[Note that I don't believe that this answer will attract "opinionated answers and discussions" since it is not about personal recommendations, but rather asking whether there are any existing guidelines or sources on the subject at all - of which I could not find any.]
Frequently I ask SO questioners, what's the shape? the dtype? even the type. Keeping tracking of those properties is a big part of good numpy programming. Even in MATLAB I found that getting the size right was 80% of debugging.
type
The squeeze example revolves around type, the ndarray class versus the np.matrix subclass:
In [160]: np.squeeze(np.array([[1, 1]]))
Out[160]: array([1, 1])
In [161]: np.squeeze(np.matrix([[1, 1]]))
Out[161]: matrix([[1, 1]])
np.matrix object is, by definition, always 2d. That's the core of how it redefines ndarray operations.
Many numpy functions delegate their work to methods. The code fornp.squeeze` is:
try:
squeeze = a.squeeze
except AttributeError:
return _wrapit(a, 'squeeze')
try:
# First try to use the new axis= parameter
return squeeze(axis=axis)
except TypeError:
# For backwards compatibility
return squeeze()
So In [161] is really:
In [163]: np.matrix([[1, 1]]).squeeze()
Out[163]: matrix([[1, 1]])
np.matrix.squeeze has its own documentation.
As a general rule we discourage the use of np.matrix. It was a created years ago to make things easier for wayward MATLAB programmers. Back in those days MATLAB only had 2d matrices (even now MATLAB 'scalars' are 2d).
dtype
np.array is a powerful function. Usually its behavior is intuitive, but sometimes it makes too many assumptions.
Usually it takes clues from the input, whether integer, float, string, and/or lists:
In [170]: np.array(1).dtype
Out[170]: dtype('int64')
In [171]: np.array(1.0).dtype
Out[171]: dtype('float64')
But it provides a number of parameters. Use those if you need more control:
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
In [173]: np.array(1, float).dtype
Out[173]: dtype('float64')
In [174]: np.array('1', float).dtype
Out[174]: dtype('float64')
In [177]: np.array('1', dtype=float,ndmin=2)
Out[177]: array([[1.]])
Look at it's docs, and also at the https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html page which lists many other array creation functions. Look at some their code as well.
For example np.atleast_2d does a lot of shape checking:
def atleast_2d(*arys):
res = []
for ary in arys:
ary = asanyarray(ary)
if ary.ndim == 0:
result = ary.reshape(1, 1)
elif ary.ndim == 1:
result = ary[newaxis,:]
else:
result = ary
res.append(result)
if len(res) == 1:
return res[0]
else:
return res
Functions like this are good examples of defensive programming.
We get a lot SO questions about 1d arrays with dtype=object.
In [272]: np.array([[1,2,3],[2,3]])
Out[272]: array([list([1, 2, 3]), list([2, 3])], dtype=object)
np.array tries to create a multidimensional array with a uniform dtype. But if the elements differ in size or can't be cast to the same dtype, it will fall back on object dtype. This is one of those situations where we need to pay attention to shape and dtype.
broadcasting
Broadcasting has been a part of numpy forever, and there's no way of turning it off. Octave and MATLAB have added it later, and do enable warning switches.
The first defensive step is to understand the broadcasting principles, namely
it can expand the beginning dimensions to match
it coerce unitary dimensions to match.
So a basic example is:
In [180]: np.arange(3)[:,None] + np.arange(4)
Out[180]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 3, 4, 5]])
The first term is (3,) expanded to (3,1). The second is (4,) which, by broadcasting expands to (1,4). Together (3,1) and (1,4) broadcast to (3,4).
Many numpy functions have parameters that make keeping track of dimensions easier. For example sum (and others) has a keepdims parameter:
In [181]: arr = _
In [182]: arr.sum(axis=0)
Out[182]: array([ 3, 6, 9, 12]) # (4,) shape
In [183]: arr.sum(axis=0,keepdims=True)
Out[183]: array([[ 3, 6, 9, 12]]) # (1,4) shape
In [184]: arr/_ # (3,4) / (1,4) => (3,4)
Out[184]:
array([[0. , 0.16666667, 0.22222222, 0.25 ],
[0.33333333, 0.33333333, 0.33333333, 0.33333333],
[0.66666667, 0.5 , 0.44444444, 0.41666667]])
In this case the keepdims isn't essential since (3,4)/(4,) works. But with axis=1 sum the shape becomes (3,) which can't broadcast with (3,4). But (3,1) can:
In [185]: arr/arr.sum(axis=1,keepdims=True)
Out[185]:
array([[0. , 0.16666667, 0.33333333, 0.5 ],
[0.1 , 0.2 , 0.3 , 0.4 ],
[0.14285714, 0.21428571, 0.28571429, 0.35714286]])
To manage shapes I like to:
display shape while debugging
test snippets interactively
test with diagnostic shapes, e.g. np.arange(24).reshape(2,3,4)
assertion statements in functions can be useful assert(arr.ndim==1)
typing
Recent Python 3 versions have added a typing module
https://docs.python.org/3/library/typing.html
Even for built-in Python types it's provisional. I'm not sure much has been added for numpy.
In some ways, an answer to this question is no different than general guidelines for safe programming:
Check and sanitise code early, for every function
Maintain relevant unit tests.
Yes, this may sound like extra overhead, but the reality is you're probably already doing such checks and tests by hand anyway, so it's good practice to put it down on paper and formalise / automate the process. E.g., while you may have never expected a matrix output specifically, any unit test that checked your output is the expected array would have failed reliably.
You might also want to have a look at specialised testing tools that are specific to scientific code, e.g. the Hypothesis package
One thing that is specific to numpy is the handling of Floating Errors; the default simply 'prints' a warning statement to stdout, which can be missed easily (and does not cater for proper exception handling workflows). You can convert this functionality to throw proper warnings / exceptions that you can capture, via the numpy.seterr method -- e.g. numpy.seterr(all='raise').
If you want to use numpy in a "safer" way, you'll probably have to create your own safety net. One way to do so would be to define wrappers that enforce the rules you want your code to obey. You can come up with your own wrappers and tests as you go along and/or stumble upon behaviour that you consider problematic.
Some toy examples:
Always have float arrays:
def arrayf64(*args, **kwargs):
kwargs.setdefault("dtype", np.float64)
return np.array(*args, **kwargs)
Disable broadcasting:
def without_broadcasting(op, arr1, arr2):
assert arr1.ndim == arr2.ndim
return op(arr1, arr2)
Warn when using np.matrix:
def safe_np_matrix(*args, **kwargs):
warnings.warn('Unsafe code detected. Usage of np.matrix found.')
return np.matrix(*args, **kwargs)
np.matrix = safe_np_matrix
I want to create a numpy ndarray by specifying a function of row and column to define the values.
For example, I'd like to be able to do something like this (completely incorrect syntax, but you should get the idea):
>>> np.ndarray((2,3), lambda r,c: 3*r+c)
[[ 0 1 2]
[ 3 4 5]]
Searching online has yielded nothing, though I've had trouble thinking of exactly how to search for it...
Right now I've got (the equivalent of) the following code, but it's horribly inefficient:
def ndarrayFuncFill(size, func):
z = np.empty(size)
for r in np.arange(size[0]):
for c in np.arange(size[1]):
z[r][c] = func(r,c)
return z
>>> ndarrayFuncFill((2,3), lambda r,c: 3*r+c)
array([[ 0., 1., 2.],
[ 3., 4., 5.]])
Unfortunately, the function I particularly want to use this with right now is not something I can easily rewrite as a ufunc or anything like that. I pretty much have to treat it as a black box.
The function I'm actually interested in using this with (not something so simple as the above lambda), is not one I have permission to post. However, it essentially does interpolation on a lookup table. So you give it a row and column, and then it translates that to indices in a lookup table -- but there's some tricky stuff going on where it's not just a one-to-one lookup, it sometimes does a combination of 'nearby' values, and that sort of thing. So it's not the most efficient function either, but I'd rather not have too many other silly sources of waste like nested for-loops.
Any suggestions?
You could try using index arrays.
For your simple example, using np.indices you could do something like:
import numpy as np
r, c = 2, 3
a = np.empty((r, c))
b = np.indices((r, c))
a[b[0], b[1]] = 3 * b[0] + b[1]
So then we have:
>>> a
array([[ 0., 1., 2.],
[ 3., 4., 5.]])
The fastest solution for your particular example is np.arange(6).reshape(3, 2). In general you could use np.vectorize for 1D arrays and then reshape if necessary, but this isn't optimized ("The implementation is essentially a for loop.").
This seems like a simple problem but I can't figure it out.
I have a numpy array of an arbitrary dimension (rank) N. I need to set a single element in the array to 0 given by the index values in a 1D array of length N. So for example:
import numpy as np
A=np.ones((2,2,2))
b=[1,1,1]
so at first I thought
A[b]=0
would do the job, but it did not.
If I knew A had a rank of 3 it would be a simple case of doing this:
A[b[0],b[1],b[2]]=0
but the rank of A is not known until runtime, any thoughts?
Indexing in numpy has somewhat complicated rules. In your particular case this warning applies:
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this is occurs.
Also recognize that x[[1,2,3]] will trigger advanced indexing, whereas x[[1,2,slice(None)]] will trigger basic slicing.
You want simple indexing (addressing a particular element), so you'll have to cast your list to a tuple:
A[tuple(b)] = 0
Result:
>>> A
array([[[ 1., 1.],
[ 1., 1.]],
[[ 1., 1.],
[ 1., 0.]]])
I have defined couple of arrays in Python but I am having problem in calculation of the product.
import numpy as np
phi = np.array([[ 1., 1.],[ 0., 1.]])
P = np.array([[ 999., 0.],[ 0., 999.]])
np.dot(phi, P, phi.T)
I get the error:
ValueError: output array is not acceptable (must have the right type, nr dimensions, and be a C-Array)
But I do not know what is the problem, since the size of matrix or array is 2 by 2
As the documentation explains, numpy.dot only multiplies two matrices. The third, optional argument is an array in which to store the results. If you want to multiply three matrices, you will need to call dot twice:
numpy.dot(numpy.dot(phi, P), phi.T)
Note that arrays have a dot method that does the same thing as numpy.dot, which can make things easier to read:
phi.dot(P).dot(phi.T)
phi.T is the same as phi.transpose() (as stated in the docs). It is basically a return value of a class method. Therefore you can't use it as an output storage for the dot product.
Update
It appears that there is an additional problem here, that can be seen if saving the transposed matrix into new variable and using it as an output:
>>> g = phi.T
>>> np.dot(phi, P, g)
is still giving an error. The problem seem to be with the way the result of transpose is stored in the memory. The output parameter for the dot product has to be C-contiguous array, but in this case g is not like that. To overcome this issue the numpy.ascontiguousarray method can be used, which solves the problem:
>>> g = np.ascontiguousarray(phi.T)
>>> np.dot(phi, P, g)
array([[ 999., 999.],
[ 0., 999.]])
The error message points that there can be 3 reasons why it cannot perform np.dot(phi, P, out=phi.T):
"must have the right type": That is ok in the first example, since all the elements of P and phi are floating numbers. But not with the other example mentioned at the comments, where the c[0,0] element is floating point number, but the output array wants to be an integer at all positions since both 'a' and 'b' contains integers everywhere.
"nr dimensions":2x2 is the expected dimension of the output array, so the problem is definitely not with the dimensions.
"must be a C-Array": This actually means that the output array must be C-contingous. There is a very good description what actually C and F contingous mean: difference between C and F contingous arrays. To make the long story short, if phi is C-contingous (and by default it is) than phi.T will be F-contingous.
You can check it by checking the flag attributes:
>>> phi.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
...
>>> phi.T.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
...