python numpy - unable to compare 2 arrays - python

I have the 2 arrays as follows:
x = array(['2019-02-28', '2019-03-01'], dtype=object)
z = array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
I'm trying to use np.where to determine on which index the 2 matrixes are aligned.
I'm doing
i = np.where (z == x) but it doesn't work, I get an empty array as a result. It looks like it's comparing the whole array is equal to the other whole array whereas I'm looking for the matching values and would like to get matching results between the 2. How should I do it ?
Thanks
Regards
edit: expected outcome is yes [True, False, False]

The where result is only as good as the boolean it searches. If the argument does not have any True values, where returns empty:
In [308]: x = np.array(['2019-02-28', '2019-03-01'], dtype=object)
...: z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
In [309]: x==z
/usr/local/bin/ipython3:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
#!/usr/bin/python3
Out[309]: False
If you aren't concerned about order:
In [311]: np.isin(z,x)
Out[311]: array([ True, False, True])
or trimming z:
In [312]: x==z[:2]
Out[312]: array([ True, False])
to extend x you could first use np.pad, or use itertools.zip_longest
In [353]: list(itertools.zip_longest(x,z))
Out[353]:
[('2019-02-28', '2019-02-28'),
('2019-03-01', '2019-03-02'),
(None, '2019-03-01')]
In [354]: [i==j for i,j in itertools.zip_longest(x,z)]
Out[354]: [True, False, False]
zip_longest accepts other fill values if that makes the comparison better.

Is this what you need:
print([i for i, (x, y) in enumerate(zip(x, z)) if x == y])

As the two arrays have different sizes compare over the minimum of the two sizes.
Edit:
I just reread the question and comments.
result= np.zeros( max(x.size, z.size), dtype=bool) # result size of the biggest array.
size = min(x.size, z.size)
result[:size] = z[:size] == x[:size] # Comparison at smallest size.
result
# array([ True, False, False])
This gives the boolean mask the comment asks for.
Original answer
import numpy as np
x = np.array(['2019-02-28', '2019-03-01'], dtype=object)
z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
size = min(x.size, z.size)
np.where(z[:size]==x[:size]) # Select the common range
# (array([0], dtype=int64),)
On my machine this is slower than the list comprehension from #U10-Forward for dtype=object but faster if numpy selects the dtype, 'Unicode 10'.
x = np.array(['2019-02-28', '2019-03-01'])
z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'])

Related

Is there a 2-D "where" in numpy?

This might seem an odd question, but it boils down to quite a simple operation that I can't find a numpy equivalent for. I've looked at np.where as well as many other operations but can't find anything that does this:
a = np.array([1,2,3])
b = np.array([1,2,3,4])
c = np.array([i<b for i in a])
The output is a 2-D array (3,4), of booleans comparing each value.
If you're asking how to get c without loop, try this
# make "a" a column vector
# > broadcasts to produce a len(a) x len(b) array
c = b > a[:, None]
c
array([[False, True, True, True],
[False, False, True, True],
[False, False, False, True]])
You can extend the approach in the other answer to get the values of a and b. Given a mask of
c = b > a[:, None]
You can extract the indices for each dimension using np.where or np.nonzero:
row, col = np.nonzero(c)
And use the indices to get the corresponding values:
ag = a[row]
bg = b[col]
Elements of a and b may be repeated in the result.

Counting The number None values using labmda function

I have an array consisting of a bunch values, where some of them are Nan and the others are None. I want to count each of them. I can achieve this with a simple for loop as shown,
xx = np.array([2,3,4,None,np.nan,None])
count_None = 0
count_nan = 0
for i in xx:
if i is None:
count_None =+ 1
if i is np.nan:
count_nan =+ 1
I want to find out if I can achieve the same result in one line, perhaps using a lambda function. I tried writing it as so. But of course, the syntax is incorrect. Any ideas?
lambda xx: count_None =+1 if xx is None
One way of achieving it as a one liner is :
len([i for i in xx if i is None])
# or the count function
xx.count(None)
or you can use the numpy.count_nonzero:
np.count_nonzero(xx == None)
Using a lambda function, you can create a list.count() - like function:
>>> counter = lambda x,y:len([i for i in x if i == y])
>>> counter(xx,None)
2
This isn't a lambda but it creates a new list of just the None values and counts the length of that list.
import numpy as np
xx = np.array([2,3,4,None,np.nan,None])
print(len([elem for elem in xx if elem is None]))
if you don't need it to be in numpy you can use the list count method
xx = [2,3,4,None,np.nan,None]
print(xx.count(None))
A third approach:
>>> nan_count, none_count = np.sum([i is np.nan for i in xx]), np.sum([i is None for i in xx])
>>> print(nan_count, none_count)
1, 2
I'd tend to prefer two lines (one for each computation), but this works. It works by adding 1 for each True value, and 0 for each False value.
Another approach if you really want to use a lambda is to use functools.reduce which will perform the sum iteratively. Here, we start with a value of 0, and add 1 for each element that evaluates true:
>>> functools.reduce(lambda x,y: x+(y is np.nan), xx, 0)
1
>>> functools.reduce(lambda x,y: x+(y is None), xx, 0)
2
l= len(list(filter(lambda x:x is None, xx)))
It will return the number of NaN values. But the filter function will work with the list only.
You can use this approach if you want to use lambda.
I prefer using numpy function (np.count_nonzero)
lambda is just a restricted format for creating a function. It is 'one-line' and returns a value. It should not be used for side effects. You use of counter += 1 is a side effect, so can't be use in a lambda.
A lambda that identifies the None values, can be used with map:
In [27]: alist = [2,3,4,None,np.nan,None]
In [28]: list(map(lambda x: x is None, alist))
Out[28]: [False, False, False, True, False, True]
map returns an iterable, which has to be expanded with list, or with sum:
In [29]: sum(map(lambda x: x is None, alist))
Out[29]: 2
But as others have shown, the list count method is simpler.
In [43]: alist.count(None)
Out[43]: 2
In [44]: alist.count(np.nan)
Out[44]: 1
An array containing None will be object dtype. Iteration on such an array is slower than iteration on the list:
In [45]: arr = np.array(alist)
In [46]: arr
Out[46]: array([2, 3, 4, None, nan, None], dtype=object)
The array doesn't have the count method. Also testing for np.nan is trickier.
In [47]: arr == None
Out[47]: array([False, False, False, True, False, True])
In [48]: arr == np.nan
Out[48]: array([False, False, False, False, False, False])
There is a np.isnan function, but that only works for float dtype arrays.
In [51]: arr.astype(float)
Out[51]: array([ 2., 3., 4., nan, nan, nan])
In [52]: np.isnan(arr.astype(float))
Out[52]: array([False, False, False, True, True, True])

numpy multiple boolean index arrays

I have an array which I want to use boolean indexing on, with multiple index arrays, each producing a different array. Example:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
Should return something along the lines of:
[[2,3], [1]]
I assume that since the number of cells containing True can vary between masks, I cannot expect the result to reside in a 2d numpy array, but I'm still hoping for something more elegant than iterating over the masks the appending the result of indexing w by the i-th b mask to it.
Am I missing a better option?
Edit: The next step I want to do afterwards is to sum each of the arrays returned by w[b], returning a list of scalars. If that somehow makes the problem easier, I'd love to know as well.
Assuming you want a list of numpy arrays you can simply use a comprehension:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
[w[bool] for bool in b]
# [array([2, 3]), array([1])]
If your goal is just a sum of the masked values you use:
np.sum(w*b) # 6
or
np.sum(w*b, axis=1) # array([5, 1])
# or b # w
…since False times you number will be 0 and therefor won't effect the sum.
Try this:
[w[x] for x in b]
Hope this helps.

Elementwise comparison to None with ndarray of object dtype

x = np.empty([2], dtype=object)
> array([None, None], dtype=object)
x[0] = 'a'
> array(['a', None], dtype=object)
I'm trying to get a boolean array [False, True] from this object typed ndarray where the object type is None.
Things that don't work: x is None, x.isfinite(), x == None, np.isnan(x). The array may be in n dimensions, making for loop iterations unpleasant to look at.
In NumPy 1.12 and earlier, you'll need to explicitly call numpy.equal to get a broadcasted equality comparison. Leave a comment, so future readers understand why you're doing it:
# Comparisons to None with == don't broadcast (yet, as of NumPy 1.12).
# We need to use numpy.equal explicitly.
numpy.equal(x, None)
In NumPy 1.13 and later, x == None will give you a broadcasted equality comparison, but you can still use numpy.equal(x, None) if you want backward compatibility with earlier versions.
You can wrap None in a list or array to force element-wise comparisons:
>>> x == [None]
array([False, True], dtype=bool)
>>> x == np.array([None])
array([False, True], dtype=bool)
A few possible ways to do that is -
x < 0
x!='a'
array([ True, False], dtype=bool)

Creating a "bitmask" from several boolean numpy arrays

I'm trying to convert several masks (boolean arrays) to a bitmask with numpy, while that in theory works I feel that I'm doing too many operations.
For example to create the bitmask I use:
import numpy as np
flags = [
np.array([True, False, False]),
np.array([False, True, False]),
np.array([False, True, False])
]
flag_bits = np.zeros(3, dtype=np.int8)
for idx, flag in enumerate(flags):
flag_bits += flag.astype(np.int8) << idx # equivalent to flag * 2 ** idx
Which gives me the expected "bitmask":
>>> flag_bits
array([1, 6, 0], dtype=int8)
>>> [np.binary_repr(bit, width=7) for bit in flag_bits]
['0000001', '0000110', '0000000']
However I feel that especially the casting to int8 and the addition with the flag_bits array is too complicated. Therefore I wanted to ask if there is any NumPy functionality that I missed that could be used to create such an "bitmask" array?
Note: I'm calling an external function that expects such a bitmask, otherwise I would stick with the boolean arrays.
>>> x = np.array(2**i for i in range(1, np.shape(flags)[1]+1))
>>> np.dot(flags, x)
array([1, 2, 2])
How it works: in a bit mask, every bit is effectively an original array element multiplied by a degree of 2 according to its position, e.g. 4 = False * 1 + True * 2 + False * 4. Effectively this can be represented as matrix multiplication, which is really efficient in numpy.
So, first line is a list comprehension to create these weights: x = [1, 2, 4, 8, ... 2^(n+1)].
Then, each line in flags is multiplied by the corresponding element in x and everything is summed up (this is how matrix multiplication works). At the end, we get the bitmask
How about this (added conversion to int8, if desired):
flag_bits = (np.transpose(flags) << np.arange(len(flags))).sum(axis=1)\
.astype(np.int8)
#array([1, 6, 0], dtype=int8)
Here's an approach to directly get to the string bitmask with boolean-indexing -
out = np.repeat('0000000',3).astype('S7')
out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
Sample run -
In [41]: flags
Out[41]:
[array([ True, False, False], dtype=bool),
array([False, True, False], dtype=bool),
array([False, True, False], dtype=bool)]
In [42]: out = np.repeat('0000000',3).astype('S7')
In [43]: out.view('S1').reshape(-1,7)[:,-3:] = np.asarray(flags).astype(int)[::-1].T
In [44]: out
Out[44]:
array([b'0000001', b'0000110', b'0000000'],
dtype='|S7')
Using the same matrix-multiplication strategy as dicussed in detail in #Marat's solution, but using a vectorized scaling array that gives us flag_bits -
np.dot(2**np.arange(3),flags)

Categories

Resources