How to map function over numpy with condition on each variable? - python

I get this error when trying to do map this function over the numpy array:
>>> a = np.array([1, 2, 3, 4, 5])
>>> g = lambda x: 0 if x % 2 == 0 else 1
>>> g(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I was expecting result array([ 1, 0, 1, 0, 1])
When it works fine in this case:
>>> f = lambda x: x ** 2
>>> f(a)
array([ 1, 4, 9, 16, 25])
What can I do to map function g over the array a faster than a for loop, preferably using some of numpy's faster code?

This has problems:
a = np.array([1, 2, 3, 4, 5])
g = lambda x: 0 if x % 2 == 0 else 1
g(a)
A lambda is essentially just an unnamed function, which you happen to be naming here, so you might as well:
def g(x):
return 0 if x % 2 == 0 else 1
But that's still a bit odd, since taking an integer modulo 2 already is 0 or 1, so this would be the same (when applied to integers, which is what you're looking to do):
def g(x):
return x % 2
At which point you have to wonder if a function is needed at all. And it isn't, this works:
a = np.array([1, 2, 3, 4, 5])
a % 2
However, note that the mistake you made is that f = lambda x: x ** 2 followed by f(a) works not because it applies the operation to each element - it applies the operation to the array, and the array supports spreading of the operation to its elements for raising to a power, just like it does for the modulo operator, which is why a % 2 works.
Result:
array([1, 0, 1, 0, 1], dtype=int32)
Note that this type of spreading isn't something that generally works - you shouldn't expect Python to just do the spreading when needed for any data type (like a list or set). It's a feature of numpy's implementation of arrays, the operations have been defined on the array and implemented to spread the operation over the elements.

You can execute mathematical some operations such as exponents on entire numpy arrays so you're doing the equivalent of np.array([ 1, 2, 3, 4, 5])**2. But you cannot use the modulus operator on a numpy array hence giving you an error.
The lambda function is being applied to the entire array here not each individual element.
You can use np.vectorize here instead:
modulus = np.vectorize(lambda x: 0 if x % 2 == 0 else 1)
modulus(a)

Lets decompose this. The general form of your ternary expression is x if C else y, with C being the comparison. Lets pull out just that comparison to see what we get:
>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2 == 0
array([False, True, False, True, False])
That mod and comparison gives a new array where each value as been moded and compared. Now you throw in an if (in this case if array([False, True, False, True, False])). But should this if be true if all of the array elements are True or maybe just a single one? What if the array is empty, is that different? That's why pandas has the methods listed in the error message - you have to decide what True is.
But not in this case. You really just wanted the base 2 modulus of each element and you don't have to work so hard to get it.
>>> a = np.array([1, 2, 3, 4, 5])
>>> a % 2
array([1, 0, 1, 0, 1])
There's your answer!

Related

How to extract numpy array stored in tuple?

Let's consider very easy example:
import numpy as np
a = np.array([0, 1, 2])
print(np.where(a < -1))
(array([], dtype=int64),)
print(np.where(a < 2))
(array([0, 1]),)
I'm wondering if its possible to extract length of those arrays, i.e. I want to know that the first array is empty, and the second is not. Usually it can be easily done with len function, however now numpy array is stored in tuple. Do you know how it can be done?
Just use this:
import numpy as np
a = np.array([0, 1, 2])
x = np.where(a < 2)[0]
print(len(x))
Outputs 2
To find the number of values in the array satisfying the predicate, you can skip np.where and use np.count_nonzero instead:
a = np.array([0, 1, 2])
print(np.count_nonzero(a < -1))
>>> 0
print(np.count_nonzero(a < 2))
>>> 2
If you need to know whether there are any values in a that satisfy the predicate, but not how many there are, a cleaner way of doing so is with np.any:
a = np.array([0, 1, 2])
print(np.any(a < -1))
>>> False
print(np.any(a < 2))
>>> True
np.where takes 3 arguments: condition, x, y where last two are arrays and are optional. When provided the funciton returns element from x for indices where condition is True, and y otherwise. When only condition is provided it acts like np.asarray(condition).nonzero() and returns a tuple, as in your case. For more details see Note at np.where.
Alternatively, because you need only length of sublist where condition is True, you can simply use np.sum(condition):
a = np.array([0, 1, 2])
print(np.sum(a < -1))
>>> 0
print(np.sum(a < 2))
>>> 2

Comparing values in two numpy arrays with 'if'

Im fairly new to numpy arrays and have encountered a problem when comparing one array with another.
I have two arrays, such that:
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
I want to do something like the following:
if b > a:
c = b
else:
c = a
so that I end up with an array c = np.array([2,4,3,5,5]).
This can be otherwise thought of as taking the max value for each element of the two arrays.
However, I am running into the error
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all().
I have tried using these but Im not sure that the are right for what I want.
Is someone able to offer some advice in solving this?
You are looking for the function np.fmax. It takes the element-wise maximum of the two arrays, ignoring NaNs.
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([2, 4, 3, 5, 2])
c = np.fmax(a, b)
The output is
array([2, 4, 3, 5, 5])
As with almost everything else in numpy, comparisons are done element-wise, returning a whole array:
>>> b > a
array([ True, True, False, True, False], dtype=bool)
So, is that true or false? What should an if statement do with it?
Numpy's answer is that it shouldn't try to guess, it should just raise an exception.
If you want to consider it true because at least one value is true, use any:
>>> if np.any(b > a): print('Yes!')
Yes!
If you want to consider it false because not all values are true, use all:
>>> if np.all(b > a): print('Yes!')
But I'm pretty sure you don't want either of these. You want to broadcast the whole if/else over the array.
You could of course wrap the if/else logic for a single value in a function, then explicitly vectorize it and call it:
>>> def mymax(a, b):
... if b > a:
... return b
... else:
... return a
>>> vmymax = np.vectorize(mymax)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
This is worth knowing how to do… but very rarely worth doing. There's usually a more indirect way to do it using natively-vectorized functions—and often a more direct way, too.
One way to do it indirectly is by using the fact that True and False are numerical 1 and 0:
>>> (b>a)*b + (b<=a)*a
array([2, 4, 3, 5, 5])
This will add the 1*b[i] + 0*a[i] when b>a, and 0*b[i] + 1*a[i] when b<=a. A bit ugly, but not too hard to understand. There are clearer, but more verbose, ways to write this.
But let's look for an even better, direct solution.
First, notice that your mymax function will do exactly the same as Python's built-in max, for 2 values:
>>> vmymax = np.vectorize(max)
>>> vmymax(a, b)
array([2, 4, 3, 5, 5])
Then consider that for something so useful, numpy probably already has it. And a quick search will turn up maximum:
>>> np.maximum(a, b)
array([2, 4, 3, 5, 5])
Here's an other way of achieving this
c = np.array([y if y>z else z for y,z in zip(a,b)])
The following methods also work:
Use numpy.maximum
>>> np.maximum(a, b)
Use numpy.max and numpy.vstack
>>> np.max(np.vstack(a, b), axis = 0)
May not be the most efficient one but this is a more suitable answer to the original question:
import numpy as np
c = np.zeros(shape=(5,1))
a = np.array([1,2,3,4,5])
b = np.array([2,4,3,5,2])
for i in range(5):
if b.item(i) > a.item(i):
c[i] = b.item(i)
else:
c[i] = a.item(i)

numpy.subtract but only until difference reaches threshold - replace numbers smaller than that with threshold

I want to subtract a given value from each element in my numpy array.
For example, if I have a numpy array called a_q, and variable called subtract_me, then I can simply do this:
result = np.subtract(a_q,subtract_me)
That's fine. But I don't want it to simply subtract blindly from every element. If the difference is lower than a threshold, then I don't want the subtraction to happen. Instead, I want that element of the array to be replaced by that threshold.
What's the most efficient way to do this? I could simply iterate through the array and subtract from each element and put a check condition on whether the threshold has been reached or not, and build a new array out of the results (as below) - but is there a better or more efficient way to do it?
threshold = 3 # in my real program, the threshold is the
# lowest non-infinity number that python can handle
subtract_me = 6
a_q = []
for i in range(10):
val = i - subtract_me
if val < threshold:
val = threshold
a_q.append(val)
myarr = np.array(a_q)
print myarr
Vectorised methods are typically most efficient with NumPy arrays so here's one way which is likely to be more efficient than iterating over an array one element at a time:
>>> threshold = 3
>>> subtract_me = 6
>>> a_q = np.arange(10)
>>> arr = a_q - subtract_me # takeaway the subtract_me value
array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3])
>>> arr[arr - subtract_me < threshold] = threshold # replace any value less than threshold
array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
EDIT: since np.clip was mentioned in the comments below the question, I may as well absorb it into my answer for completeness ;-)
Here's one way you could use it to get the desired result:
>>> np.clip((a_q - subtract_me), threshold, np.max(a_q))
array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

Python: How to find the a unique element pattern from 2 arrays?

I have two numpy arrays, A and B:
A = ([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = ([2, 3, 1, 2])
where B is a unique pattern within A.
I need the output to be all the elements of A, which aren't present in B.
Output = ([1, 2, 3, 1, 3])
Easiest is to use Python's builtins, i.e. string type:
A = "123231213"
B = "2312"
result = A.replace(B, "")
To efficiently convert numpy.array to an from str, use these functions:
x = numpy.frombuffer("3452353", dtype="|i1")
x
array([51, 52, 53, 50, 51, 53, 51], dtype=int8)
x.tostring()
"3452353"
(*) thus mixes up ascii codes (1 != "1"), but substring search will work just fine. Your data type should better fit in one char, or you may get a false match.
To sum it up, a quick hack looks like this:
A = numpy.array([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = numpy.array([2, 3, 1, 2])
numpy.fromstring(A.tostring().replace(B.tostring(), ""), dtype=A.dtype)
array([1, 2, 3, 1, 3])
# note, here dtype is some int, I'm relying on the fact that:
# "1 matches 1" is equivalent to "0001 matches 00001"
# this holds as long as values of B are typically non-zero.
#
# this trick can conceptually be used with floating point too,
# but beware of multiple floating point representations of same number
In depth explanation:
Assuming size of A and B is arbitrary, naive approach runs in quadratic time. However better, probabilistic algorithms exit, for example Rabin-Karp, which relies on sliding window hash.
Which is the main reason text oriented functions, such as xxx in str or str.replace or re will be much faster than custom numpy code.
If you truly need this function to be integrated with numpy, you can always write an extension, but it's not easy :)

compare two following values in numpy array

What is the best way to touch two following values in an numpy array?
example:
npdata = np.array([13,15,20,25])
for i in range( len(npdata) ):
print npdata[i] - npdata[i+1]
this looks really messed up and additionally needs exception code for the last iteration of the loop.
any ideas?
Thanks!
numpy provides a function diff for this basic use case
>>> import numpy
>>> x = numpy.array([1, 2, 4, 7, 0])
>>> numpy.diff(x)
array([ 1, 2, 3, -7])
Your snippet computes something closer to -numpy.diff(x).
How about range(len(npdata) - 1) ?
Here's code (using a simple array, but it doesn't matter):
>>> ar = [1, 2, 3, 4, 5]
>>> for i in range(len(ar) - 1):
... print ar[i] + ar[i + 1]
...
3
5
7
9
As you can see it successfully prints the sums of all consecutive pairs in the array, without any exceptions for the last iteration.
You can use ediff1d to get differences of consecutive elements. More generally, a[1:] - a[:-1] will give the differences of consecutive elements and can be used with other operators as well.

Categories

Resources