Comparing values in two numpy arrays with 'if' - python

Im fairly new to numpy arrays and have encountered a problem when comparing one array with another.
I have two arrays, such that:
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
I want to do something like the following:
if b > a:
c = b
else:
c = a
so that I end up with an array c = np.array([2,4,3,5,5]).
This can be otherwise thought of as taking the max value for each element of the two arrays.
However, I am running into the error
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all().
I have tried using these but Im not sure that the are right for what I want.
Is someone able to offer some advice in solving this?

You are looking for the function np.fmax. It takes the element-wise maximum of the two arrays, ignoring NaNs.
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([2, 4, 3, 5, 2])
c = np.fmax(a, b)
The output is
array([2, 4, 3, 5, 5])

As with almost everything else in numpy, comparisons are done element-wise, returning a whole array:
>>> b > a
array([ True, True, False, True, False], dtype=bool)
So, is that true or false? What should an if statement do with it?
Numpy's answer is that it shouldn't try to guess, it should just raise an exception.
If you want to consider it true because at least one value is true, use any:
>>> if np.any(b > a): print('Yes!')
Yes!
If you want to consider it false because not all values are true, use all:
>>> if np.all(b > a): print('Yes!')
But I'm pretty sure you don't want either of these. You want to broadcast the whole if/else over the array.
You could of course wrap the if/else logic for a single value in a function, then explicitly vectorize it and call it:
>>> def mymax(a, b):
... if b > a:
... return b
... else:
... return a
>>> vmymax = np.vectorize(mymax)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
This is worth knowing how to do… but very rarely worth doing. There's usually a more indirect way to do it using natively-vectorized functions—and often a more direct way, too.
One way to do it indirectly is by using the fact that True and False are numerical 1 and 0:
>>> (b>a)*b + (b<=a)*a
array([2, 4, 3, 5, 5])
This will add the 1*b[i] + 0*a[i] when b>a, and 0*b[i] + 1*a[i] when b<=a. A bit ugly, but not too hard to understand. There are clearer, but more verbose, ways to write this.
But let's look for an even better, direct solution.
First, notice that your mymax function will do exactly the same as Python's built-in max, for 2 values:
>>> vmymax = np.vectorize(max)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
Then consider that for something so useful, numpy probably already has it. And a quick search will turn up maximum:
>>> np.maximum(a, b)
array([2, 4, 3, 5, 5])

Here's an other way of achieving this
c = np.array([y if y>z else z for y,z in zip(a,b)])

The following methods also work:
Use numpy.maximum
>>> np.maximum(a, b)
Use numpy.max and numpy.vstack
>>> np.max(np.vstack(a, b), axis = 0)

May not be the most efficient one but this is a more suitable answer to the original question:
import numpy as np
c = np.zeros(shape=(5,1))
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
for i in range(5):
if b.item(i) > a.item(i):
c[i] = b.item(i)
else:
c[i] = a.item(i)

Related

How to map function over numpy with condition on each variable?

I get this error when trying to do map this function over the numpy array:
>>> a = np.array([1, 2, 3, 4, 5])
>>> g = lambda x: 0 if x % 2 == 0 else 1
>>> g(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I was expecting result array([ 1, 0, 1, 0, 1])
When it works fine in this case:
>>> f = lambda x: x ** 2
>>> f(a)
array([ 1, 4, 9, 16, 25])
What can I do to map function g over the array a faster than a for loop, preferably using some of numpy's faster code?
This has problems:
a = np.array([1, 2, 3, 4, 5])
g = lambda x: 0 if x % 2 == 0 else 1
g(a)
A lambda is essentially just an unnamed function, which you happen to be naming here, so you might as well:
def g(x):
return 0 if x % 2 == 0 else 1
But that's still a bit odd, since taking an integer modulo 2 already is 0 or 1, so this would be the same (when applied to integers, which is what you're looking to do):
def g(x):
return x % 2
At which point you have to wonder if a function is needed at all. And it isn't, this works:
a = np.array([1, 2, 3, 4, 5])
a % 2
However, note that the mistake you made is that f = lambda x: x ** 2 followed by f(a) works not because it applies the operation to each element - it applies the operation to the array, and the array supports spreading of the operation to its elements for raising to a power, just like it does for the modulo operator, which is why a % 2 works.
Result:
array([1, 0, 1, 0, 1], dtype=int32)
Note that this type of spreading isn't something that generally works - you shouldn't expect Python to just do the spreading when needed for any data type (like a list or set). It's a feature of numpy's implementation of arrays, the operations have been defined on the array and implemented to spread the operation over the elements.
You can execute mathematical some operations such as exponents on entire numpy arrays so you're doing the equivalent of np.array([ 1, 2, 3, 4, 5])**2. But you cannot use the modulus operator on a numpy array hence giving you an error.
The lambda function is being applied to the entire array here not each individual element.
You can use np.vectorize here instead:
modulus = np.vectorize(lambda x: 0 if x % 2 == 0 else 1)
modulus(a)
Lets decompose this. The general form of your ternary expression is x if C else y, with C being the comparison. Lets pull out just that comparison to see what we get:
>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2 == 0
array([False, True, False, True, False])
That mod and comparison gives a new array where each value as been moded and compared. Now you throw in an if (in this case if array([False, True, False, True, False])). But should this if be true if all of the array elements are True or maybe just a single one? What if the array is empty, is that different? That's why pandas has the methods listed in the error message - you have to decide what True is.
But not in this case. You really just wanted the base 2 modulus of each element and you don't have to work so hard to get it.
>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2
array([1, 0, 1, 0, 1])
There's your answer!

How to quickly grab specific indices from a numpy array?

But I don't have the index values, I just have ones in those same indices in a different array. For example, I have
a = array([3,4,5,6])
b = array([0,1,0,1])
Is there some NumPy method than can quickly look at both of these and extract all values from a whose indices match the indices of all 1's in b? I want it to result in:
array([4,6])
It is probably worth mentioning that my a array is multidimensional, while my b array will always have values of either 0 or 1. I tried using NumPy's logical_and function, though this returns ValueError with a and b having different dimensions:
a = numpy.array([[3,2], [4,5], [6,1]])
b = numpy.array([0, 1, 0])
print numpy.logical_and(a,b)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Though this method does seem to work if a is flat. Either way, the return type of numpy.logical_and() is a boolean, which I do not want. Is there another way? Again, in the second example above, the desired return would be
array([[4,5]])
Obviously I could write a simple loop to accomplish this, I'm just looking for something a bit more concise.
Edit:
This will introduce more constraints, I should also mention that each element of the multidimensional array a may be any arbitrary length, that does not match its neighbour.
You can simply use fancy indexing.
b == 1
will give you a boolean array:
>>> from numpy import array
>>> a = array([3,4,5,6])
>>> b = array([0,1,0,1])
>>> b==1
array([False, True, False, True], dtype=bool)
which you can pass as an index to a.
>>> a[b==1]
array([4, 6])
Demo for your second example:
>>> a = array([[3,2], [4,5], [6,1]])
>>> b = array([0, 1, 0])
>>> a[b==1]
array([[4, 5]])
You could use compress:
>>> a = np.array([3,4,5,6])
>>> b = np.array([0,1,0,1])
>>> a.compress(b)
array([4, 6])
You can provide an axis argument for multi-dimensional cases:
>>> a2 = np.array([[3,2], [4,5], [6,1]])
>>> b2 = np.array([0, 1, 0])
>>> a2.compress(b2, axis=0)
array([[4, 5]])
This method will work even if the axis of a you're indexing against is a different length to b.

ValueError: too many boolean indices for a n=600 array (float)

I am getting an issue where I am trying to run (on Python):
#Loading in the text file in need of analysis
x,y=loadtxt('2.8k to 293k 15102014_rerun 47_0K.txt',skiprows=1,unpack=True,dtype=float,delimiter=",")
C=-1.0 #Need to flip my voltage axis
yone=C*y #Actually flipping the array
plot(x,yone)#Test
origin=600.0#Where is the origin? i.e V=0, taking the 0 to 1V elements of array
xorg=x[origin:1201]# Array from the origin to the final point (n)
xfit=xorg[(x>0.85)==True] # Taking the array from the origin and shortening it further to get relevant area
It returns the ValueError. I have tried doing this process with a much smaller array of 10 elements and the xfit=xorg[(x>0.85)==True] command works fine. What the program is trying to do is to narrow the field of vision, of some data, to a relevant point so I can fit a line of best fit a linear element of the data.
I apologise for the formatting being messy but this is the first question I have asked on this website as I cannot search for something that I can understand where I am going wrong.
This answer is for people that don't know about numpy arrays (like me), thanks MrE for the pointers to numpy docs.
Numpy arrays have this nice feature of boolean masks.
For numpy arrays, most operators return an array of the operation applied to every element - instead of a single result like in plain Python lists:
>>> alist = range(10)
>>> alist
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> alist > 5
True
>>> anarray = np.array(alist)
>>> anarray
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> anarray > 5
array([False, False, False, False, False, False, True, True, True, True], dtype=bool)
You can use an array of bool as the index for a numpy array, in this case you get a filtered array for the positions where the corresponding bool array element is True.
>>> mask = anarray > 5
>>> anarray[mask]
array([6, 7, 8, 9])
The mask must not be bigger than the array:
>>> anotherarray = anarray[mask]
>>> anotherarray
array([6, 7, 8, 9])
>>> anotherarray[mask]
ValueError: too many boolean indices
So you cant use a mask bigger than the array you are masking:
>>> anotherarray[anarray > 7]
ValueError: too many boolean indices
>>> anotherarray[anotherarray > 7]
array([8, 9])
Since xorg is smaller than x, a mask based on x will be longer than xorg and you get the ValueError exception.
Change
xfit=xorg[x>0.85]
to
xfit=xorg[xorg>0.85]
x is larger than xorg so x > 0.85 has more elements than xorg
Try the following:
replace your code
xorg=x[origin:1201]
xfit=xorg[(x>0.85)==True]
with
mask = x > 0.85
xfit = xorg[mask[origin:1201]]
This works when x is a numpy.ndarray, otherwise you might end up in problems as advanced indexing will return a view, not a copy, see SciPy/NumPy documentation.
I'm unsure whether you like to use numpy, but when trying to fit data, numpy/scipy is a good choice anyway...

Can Python perform vectorized operations?

I want to implement the following Matlab code in Python:
x=1:100;
y=20*log10(x);
I tried using Numpy to do this:
y = numpy.zeros(x.shape)
for i in range(len(x)):
y[i] = 20*math.log10(x[i])
But this uses a for loop; is there anyway to do a vectorized operation like in Matlab? I know for some simple math such as division and multiplication, it's possible. But what about other more sophisticated operations like logarithm here?
y = numpy.log10(numpy.arange(1, 101)) * 20
In [30]: numpy.arange(1, 10)
Out[30]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [31]: numpy.log10(numpy.arange(1, 10))
Out[31]:
array([ 0. , 0.30103 , 0.47712125, 0.60205999, 0.69897 ,
0.77815125, 0.84509804, 0.90308999, 0.95424251])
In [32]: numpy.log10(numpy.arange(1, 10)) * 20
Out[32]:
array([ 0. , 6.02059991, 9.54242509, 12.04119983,
13.97940009, 15.56302501, 16.9019608 , 18.06179974, 19.08485019])
Yep, there certainly is.
x = numpy.arange(1, 100)
y = 20 * numpy.log10(x)
Numpy has a lot of built-in array operators like log10. If it's not listed in numpy's documentation and you can't generate it from combining built-in methods, then there's no easy way to do it efficiently. You can implement a C-level function to work on numpy arrays and compile that, but this is a lot more work than one or two lines of Python code.
For your case you almost have the right output already:
y = 20*numpy.log10(x)
You may want to take a look at the Numpy documentation. This is a good place to start:
http://docs.scipy.org/doc/numpy/reference/routines.html
And specifically related to your question:
http://docs.scipy.org/doc/numpy/reference/routines.math.html
If you're not trying to do anything complicated, the original code could be implemented this way as well, without requiring the use of numpy, if I'm not mistaken.
>>> import math
>>> x = range(1, 101)
>>> y = [ 20 * math.log10(z) for z in x ]
Apart from performing vectorized operation using numpy standard vectorized functions, you can also make your custom vectorized function using numpy.vectorize. Here is one example:
>>> def myfunc(a, b):
... "Return a-b if a>b, otherwise return a+b"
... if a > b:
... return a - b
... else:
... return a + b
>>>
>>> vfunc = np.vectorize(myfunc)
>>> vfunc([1, 2, 3, 4], 2)
array([3, 4, 1, 2])
As mentioned in documentation, unlike numpy's standard vectorized functions, this won't improve the performance

compare two following values in numpy array

What is the best way to touch two following values in an numpy array?
example:
npdata = np.array([13,15,20,25])
for i in range( len(npdata) ):
print npdata[i] - npdata[i+1]
this looks really messed up and additionally needs exception code for the last iteration of the loop.
any ideas?
Thanks!
numpy provides a function diff for this basic use case
>>> import numpy
>>> x = numpy.array([1, 2, 4, 7, 0])
>>> numpy.diff(x)
array([ 1, 2, 3, -7])
Your snippet computes something closer to -numpy.diff(x).
How about range(len(npdata) - 1) ?
Here's code (using a simple array, but it doesn't matter):
>>> ar = [1, 2, 3, 4, 5]
>>> for i in range(len(ar) - 1):
... print ar[i] + ar[i + 1]
...
3
5
7
9
As you can see it successfully prints the sums of all consecutive pairs in the array, without any exceptions for the last iteration.
You can use ediff1d to get differences of consecutive elements. More generally, a[1:] - a[:-1] will give the differences of consecutive elements and can be used with other operators as well.

Categories

Resources