Still trying to earn my numpy stripes: I want to perform an arithmetic operation on two numpy arrays, which is simple enough:
return 0.5 * np.sum(((array1 - array2) ** 2) / (array1 + array2))
Problem is, I need to be able to specify the condition that, if both arrays are element-wise 0 at the same element i, don't perform the operation at all--would be great just to return 0 on this one--so as not to divide by 0.
However, I have no idea how to specify this condition without resorting to the dreaded nested for-loop. Thank you in advance for your assistance.
Edit: Would also be ideal not to have to resort to a pseudocount of +1.
Just replace np.sum() by np.nansum():
return 0.5 * np.nansum(((array1 - array2) ** 2) / (array1 + array2))
np.nansum() treats nans as zero.
return numpy.select([array1 == array2, array1 != array2], [0.5 * np.sum(((array1 - array2) ** 2) / (array1 + array2)), 0])
should do the trick... numpy.where might also be used.
You could also try post-applying numpy.nan_to_num:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html
although I found that when I have a divide by zero in your code, it gives a warning, but fills that element with zero (when doing integer math) and NaN when using floats.
If you want to skip the sum when you have a divide by zero, you could also just do the calculation and then test for the NaN before returning:
xx = np.sum(((array1 - array2) ** 2) / (array1 + array2))
if np.isnan(xx):
return 0
else:
return xx
Edit: To silence warnings you could try messing around with numpy.seterr:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html
Related
I'm trying to simplify some expressions of positive odd integers with sympy. But sympy refuses to expand floor, making the simplification hard to proceed.
To be specific, x is a positive odd integer (actually in my particular use case, the constraint is even stricter. But sympy can only do odd and positive, which is fine). x // 2 should be always equal to (x - 1) / 2. Example code here:
from sympy import Symbol, simplify
x = Symbol('x', odd=True, positive=True)
expr = x // 2 - (x - 1) / 2
print(simplify(expr))
prints -x/2 + floor(x/2) + 1/2. Ideally it should print 0.
What I've tried so far:
Simplify (x - 1) // 2 - (x - 1) / 2. Turns out to be 0.
Multiply the whole thing by 2: 2 * (x // 2 - (x - 1) / 2). Gives me: -x + 2*floor(x/2) + 1.
Try to put more weights on the FLOOR op by customizing the measure. No luck.
Use sympy.core.evaluate(False) context when creating the expression. Nuh.
Tune other parameters like ratio, rational, and play with other function like expand, factor, collect. Doesn't work either.
EDIT: Wolfram alpha can do this.
I tried to look like the assumptions of x along with some expressions. It surprises me that (x - 1) / 2).is_integer returns None, which means unknown.
I'm running out of clues. I'm even looking for alternativese of sympy. Any ideas guys?
I fail to see why sympy can't simplify that.
But, on another hand, I've discovered the existence of odd parameter just now, with your question.
What I would have done, without the knowledge of odd is
k = Symbol('k', positive=True, integer=True)
x = 2*k-1
expr = x // 2 - (x - 1) / 2
Then, expr is 0, without even the need to simplify.
So, can't say why you way doesn't work (and why that odd parameter exists if it is not used correctly to guess that x-1 is even, and therefore (x-1)/2 integer). But, in the meantime, my way of defining an odd integer x works.
There is some reluctance to make too much automatic in SymPy, but this seems like a case that could be addressed (since (x-1)/2 is simpler than floor(x/2). Until then, however, you can run a replacement on your expression which makes this transformation for you.
Let's define a preferred version of floor:
def _floor(x):
n, d = x.as_numer_denom()
if d == 2:
if n.is_odd:
return (n - 1)/2
if n.is_even:
return n/2
return floor(x)
When you have an expression with floor that you want to evaluate, replace floor with _floor:
>>> x = Symbol('x', odd=True)
>>> eq=x // 2 - (x - 1) / 2
>>> eq.replace(floor, _floor)
0
I've got some trouble with the precision of my array operations. I'm doing alot of array calculations where some cells of the array have to be left out, done either by masking or, in this case, by assigning very small values neary np.finfo(float).tiny to the array cells to leave out.
But during array operations this causes an error of around 1e-14 which is quite near to machine epsilon. But still I don't know where the error is coming from and how to avoid it. Since I perform these operations several million times, the errors stack up to a total error of around 2-3%.
Here is my minimum working example:
arr = np.arange(20).astype(float)
arr[0] = 1e-290
t1 = np.random.rand(20) * 100
t2 = np.random.rand(20) * 100
a = (arr * (t1 - t2)).sum()
b = (arr * (t1 - t2))[1:].sum()
d = (arr * (t1 - t2))[0].sum()
c = b - a
print(c)
# Out[99]: 4.5474735088646412e-13
To avoid this problem, I tried to mask arr:
arr_mask = np.ma.masked_where(arr < 1e-200, arr)
a_mask = (arr_mask * (t1 - t2)).sum()
b_mask = (arr_mask * (t1 - t2))[1:].sum()
c_mask = b_mask - a_mask
print(c_mask)
# Out[118]: 4.5474735088646412e-13
Why is the difference, c so many magnitudes bigger than d, which should be the difference? I guess some machine epsilon problem from assigning such a small value to the array in the first place? But still np.finfo(float).eps with 2.2204460492503131e-16 is around a 1000 times smaller than c.
How can I avoid this? Setting the elements to zero won't work, since I have lots of divisions. In this case I can't use masking to several reasons. BUT the position of the cells which have to be left out does NEVER change. So can I somehow assign a "safe" value to these cells to leave them out while altering the result of the total array operations?
Thanks in advance!
The granularity of a given float type is not fixed but depends on the size of the value you are starting from. I encourage you to play with the numpy.nextafter function:
a = 1.5
>>> np.nextafter(a, -1)
1.4999999999999998
>>> a - np.nextafter(a, -1)
2.220446049250313e-16
>>> a = 1e20
>>> np.nextafter(a, -1)
9.999999999999998e+19
>>> a - np.nextafter(a, -1)
16384.0
This shows that the smallest positive difference you can obtain by subtracting some fp number from a depends on the how large a is.
You should now be able to work out what happens in your example
Say I have two matrices, A and B.
I want to calculate the magnitude of difference between the two matrices. That is, without using iteration.
Here's what I have so far:
def mymagn(A, B):
i = 0
j = 0
x = np.shape(A)
y = np.shape(B)
while i < x[1]:
while j < y[1]:
np.sum((A[i][j] - B[i][j]) * (A[i][j] - B[i][j]))
j += 1
i += 1
As I understand it, generally the value should be small with two similar matrices but I'm not getting that, can anyone help? Is there any way to get rid of the need to iterate?
This should do it:
def mymagn(A, B):
return np.sum((B - A) ** 2)
For arrays/matrices of the same size, addition/subtraction are element-wise (like in MATLAB). Exponentiation with scalar exponent is also element-wise. And np.sum will by default sum all elements (along all axes).
numpy seems to not be a good friend of complex infinities
While we can evaluate:
In[2]: import numpy as np
In[3]: np.mean([1, 2, np.inf])
Out[3]: inf
The following result is more cumbersome:
In[4]: np.mean([1 + 0j, 2 + 0j, np.inf + 0j])
Out[4]: (inf+nan*j)
...\_methods.py:80: RuntimeWarning: invalid value encountered in cdouble_scalars
ret = ret.dtype.type(ret / rcount)
I'm not sure the imaginary part make sense to me. But please do comment if I'm wrong.
Any insight into interacting with complex infinities in numpy?
Solution
To compute the mean we divide the sum by a real number. This division causes problems because of type promotion (see below). To avoid type promotion we can manually perform this division separately for the real and imaginary part of the sum:
n = 3
s = np.sum([1 + 0j, 2 + 0j, np.inf + 0j])
mean = np.real(s) / n + 1j * np.imag(s) / n
print(mean) # (inf+0j)
Rationale
The issue is not related to numpy but to the way complex division is performed. Observe that ((1 + 0j) + (2 + 0j) + (np.inf + 0j)) / (3+0j) also results in (inf+nanj).
The result needs to be split into a real and imagenary part. For division both operands are promoted to complex, even if you divide by a real number. So basically the division is:
a + bj
--------
c + dj
The division operation does not know that d=0. So to split the result into real and imaginary it has to get rid of the j in the denominator. This is done by multiplying numerator and denominator with the complex conjugate:
a + bj (a + bj) * (c - dj) ac + bd + bcj - adj
-------- = --------------------- = ---------------------
c + dj (c + dj) * (c - dj) c**2 + d**2
Now, if a=inf and d=0 the term a * d * j = inf * 0 * j = nan * j.
when you run the function with a np.inf in your array the result will be the infinity object for np.mean or another functions like np.max(). But in this case for calculating the mean(), since you have complex numbers and an infinity complex numbers is defined as an infinite number in the complex plane whose complex argument is unknown or undefined, you're getting non*j as the imaginary part.
In order to get around this problem, you should ignore the infinity items in such mathematical operations. You can use isfinite() function to detect them and apply the function on finite items:
In [16]: arr = np.array([1 + 0j, 2 + 0j, np.inf + 0j])
In [17]: arr[np.isfinite(arr)]
Out[17]: array([ 1.+0.j, 2.+0.j])
In [18]: np.mean(arr[np.isfinite(arr)])
Out[18]: (1.5+0j)
Because of type promotion.
When you do the division of a complex by a real, like (inf + 0j) / 2, the (real) divisor gets promoted to 2 + 0j.
And by complex division, the imaginary part is equal to (0 * 2 - inf * 0) / 4. Note the inf * 0 here which is an indeterminate form, and it evaluates to NaN. This makes the imaginary part NaN.
And back to the topic. When numpy calculates the mean of a complex array, it really doesn't try to do anything clever. First it reduces the array with the "addition" operation, obtaining the sum. After that, the sum is divided by the count. This sum contains an inf in the real part, which causes the trouble described above when the divisor (count) gets promoted from integral type to complex floating point.
Edit: a word about solution
The IEEE floating point "infinity" is really a very primitive construct that represents indeterminate forms like 1 / 0. These forms are not constant numbers, but possible limits. The special inf or NaN "floating point numbers" are placeholders that notifies you about the presence of indeterminate forms. They do nothing about the existence or type of the limit, which you must determine by the mathematical context.
Even for real numbers, the underlying limit can depend on how you approach the limit. A superficial 1 / 0 form can go to positive or negative infinity. On the complex plane, things are even more complex (well). For example, you may run into branch cuts and (different kinds of) singularities. There's no universal solution that fits all.
Tl;dr: Fix the underlying problem in the face of ambiguous/incomplete/corrupted data, or prove that the end computational result can withstand such corruption (which can happen).
I don't know how to describe this well so I'll just show it.
How do I do this...
for iy in random_y:
print(x[np.where(y == iy)], iy)
X y
[ 0.5] : 0.247403959255
[ 2.] : 0.841470984808
[ 49.5]: -0.373464754784
without for loops and I get a solution as a single array like when you use np.where() or array[cond]. Since you know, this is Python B)
NOTE: The reason why I want to do this is because I have a random subset of the Y values and I want to find the corresponding X values.
If you are looking for exact matches, you can simply use np.in1d as this is a perfect scenario for its usage, like so -
first_output = x[np.in1d(y,random_y)]
second_output = random_y[np.in1d(random_y,y)
If you are dealing with floating-point numbers, you might want to use some tolerance factor into the comparisons. So, for such cases, you can use NumPy broadcasting and then use np.where, like so -
tol = 1e-5 # Edit this to change tolerance
R,C = np.where(np.abs(random_y[:,None] - y)<=tol)
first_output = x[C]
second_output = random_y[R]
Maybe this could do the trick(not tested):
print(Str(x[np.where(y == iy)]) + " " + Str(iy) + "\n") for iy in random_y