What does this: s[s[1:] == s[:-1]] do in numpy? - python

I've been looking for a way to efficiently check for duplicates in a numpy array and stumbled upon a question that contained an answer using this code.
What does this line mean in numpy?
s[s[1:] == s[:-1]]
Would like to understand the code before applying it. Looked in the Numpy doc but had trouble finding this information.

The slices [1:] and [:-1] mean all but the first and all but the last elements of the array:
>>> import numpy as np
>>> s = np.array((1, 2, 2, 3)) # four element array
>>> s[1:]
array([2, 2, 3]) # last three elements
>>> s[:-1]
array([1, 2, 2]) # first three elements
therefore the comparison generates an array of boolean comparisons between each element s[x] and its "neighbour" s[x+1], which will be one shorter than the original array (as the last element has no neighbour):
>>> s[1:] == s[:-1]
array([False, True, False], dtype=bool)
and using that array to index the original array gets you the elements where the comparison is True, i.e. the elements that are the same as their neighbour:
>>> s[s[1:] == s[:-1]]
array([2])
Note that this only identifies adjacent duplicate values.

Check this out:
>>> s=numpy.array([1,3,5,6,7,7,8,9])
>>> s[1:] == s[:-1]
array([False, False, False, False, True, False, False], dtype=bool)
>>> s[s[1:] == s[:-1]]
array([7])
So s[1:] gives all numbers but the first, and s[:-1] all but the last.
Now compare these two vectors, e.g. look if two adjacent elements are the same. Last, select these elements.

s[1:] == s[:-1] compares s without the first element with s without the last element, i.e. 0th with 1st, 1st with 2nd etc, giving you an array of len(s) - 1 boolean elements. s[boolarray] will select only those elements from s which have True at the corresponding place in boolarray. Thus, the code extracts all elements that are equal to the next element.

It will show duplicates in a sorted array.
Basically, the inner expression s[1:] == s[:-1] compares the array with its shifted version. Imagine this:
1, [2, 3, ... n-1, n ]
- [1, 2, ... n-2, n-1] n
=> [F, F, ... F, F ]
In a sorted array, there will be no True in resulted array unless you had repetition. Then, this expression s[array] filters those which has True in the index array.

Related

numpy multiple boolean index arrays

I have an array which I want to use boolean indexing on, with multiple index arrays, each producing a different array. Example:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
Should return something along the lines of:
[[2,3], [1]]
I assume that since the number of cells containing True can vary between masks, I cannot expect the result to reside in a 2d numpy array, but I'm still hoping for something more elegant than iterating over the masks the appending the result of indexing w by the i-th b mask to it.
Am I missing a better option?
Edit: The next step I want to do afterwards is to sum each of the arrays returned by w[b], returning a list of scalars. If that somehow makes the problem easier, I'd love to know as well.
Assuming you want a list of numpy arrays you can simply use a comprehension:
w = np.array([1,2,3])
b = np.array([[False, True, True], [True, False, False]])
[w[bool] for bool in b]
# [array([2, 3]), array([1])]
If your goal is just a sum of the masked values you use:
np.sum(w*b) # 6
or
np.sum(w*b, axis=1) # array([5, 1])
# or b # w
…since False times you number will be 0 and therefor won't effect the sum.
Try this:
[w[x] for x in b]
Hope this helps.

How to pythonically select a random index from a 2D list such that the corresponding element matches a value?

I have a 2D list of booleans. I want to select a random index from the the list where the value is False. For example, given the following list:
[[True, False, False],
[True, True, True],
[False, True, True]]
The valid choices would be: [0, 1], [0, 2], and [2, 0].
I could keep a list of valid indices and then use random.choice to select from it, but it seems unpythonic to keep a variable and update it every time the underlying list changes for only this one purpose.
Bonus points if your answer runs quickly.
We can use a oneliner like:
import numpy as np
from random import choice
choice(np.argwhere(~a))
With a the array of booleans.
This works as follows: by using ~a, we negate the elements of the array. Next we use np.argwhere to construct a k×2-array: an array where every row has two elements: for every dimension the value such that the corresponding value has as value False.
By choice(..) we thus select a random row. We can however not use this directly to access the element. We can use the tuple(..) constructor to cast it to a tuple:
>>> tuple(choice(np.argwhere(~a)))
(2, 0)
You can thus fetch the element then with:
t = tuple(choice(np.argwhere(~a)))
a[t]
But of course, it is not a surprise that:
>>> t = tuple(choice(np.argwhere(~a)))
>>> a[t]
False
My non-numpy version:
result = random.choice([
(i,j)
for i in range(len(a))
for j in range(len(a[i]))
if not a[i][j]])
Like Willem's np version, this generates a list of valid tuples and invokes random.choice() to pick one.
Alternatively, if you hate seeing range(len(...)) as much as I do, here is an enumerate() version:
result = random.choice([
(i, j)
for i, row in enumerate(a)
for j, cell in enumerate(row)
if not cell])
Assuming you don't want to use numpy.
matrix = [[True, False, False],
[True, True, True],
[False, True, True]]
valid_choices = [(i,j) for i, x in enumerate(matrix) for j, y in enumerate(x) if not y]
random.choice(valid_choices)
With list comprehensions, you can change the if condition (if not y) to suit your needs. This will return the coordinate that is randomly selected, but optionally you could change the value part of the list comprehension (i,j) in this case to: y and it'd return false, though thats a bit redundant in this case.

Convert boolean to integer location in python [duplicate]

This question already has answers here:
Getting indices of True values in a boolean list
(9 answers)
Closed 6 years ago.
I have a boolean list, say:
x = [True, False, False, True]
How do you convert this list to integer locations, so that you get the result:
y = [1, 4]
?
You could use a list comprehension in combination with the enumerate function, for example:
>>> x = [True, False, False, True]
>>> [index for index, element in enumerate(x, start=1) if element]
[1, 4]
Alternatively, if you're willing to use NumPy and get a result of type numpy.ndarray, there's a NumPy function that (almost) does what you need: numpy.where.
>>> import numpy
>>> numpy.where(x)
(array([0, 3]),)
>>> numpy.where(x)[0] + 1
array([1, 4])
The strange [0] in the line above is there because numpy.where always returns its results in a tuple: one element of the tuple for each dimension of the input array. Since in this case the input array is one-dimensional, we don't really care about the outer tuple structure, so we use the [0] indexing operation to pull out the actual array we need. The + 1 is there to get from Python / NumPy's standard 0-based indexing to the 1-based indexing that it looks as though you want here.
If you're working with large input data (and especially if the input list is already in the form of a NumPy array), the NumPy solution is likely to be significantly faster than the list comprehension.
Use enumerate within a list comprehension:
>>> [i for i, j in enumerate(x, 1) if j]
[1, 4]
A simple one-liner would be:
[i+1 for i in range(len(x)) if x[i]]

Deleting elements at specific positions in a M X N numpy array

I am trying to implement Seam carving algorithm wherein we have to delete a seam from the image. Image is stored as a numpy M X N array. I have found the seam, which is nothing but an array of M integers whose value specifies column values to be deleted.
Eg: a 2 X 3 array
import numpy
img_array = numpy.array([[1, 2, 3],[4, 5, 6]])
and
seam = numpy.array([1,2])
This means that we have to delete from the Img 1st element from the 1st row (1), and second element from the second row (5). After deletion, Img will be
print img_array
[[2,3]
[4,6]]
Work done:
I am new to python and I have found solutions which concern about single dimensional array or deleting an entire row or column. But I could not find a way to delete elements from specific columns.
Will you always delete one element from each row? If you try to delete one element from one row, but not another, you will end up with a ragged array. That is why there isn't a general purpose way of removing single elements from a 2d array.
One option is to figure out which ones you want to delete, remove them from a flattened array, and then reshape it back to the correct shape. Then it is your responsibility to ensure that the correct number of elements are removed.
All of these 'delete' methods actually copy the 'keep' values to a new array. Nothing actually deletes elements from the original array. So you could just as easily (and just as fast) do your own copy to a new array.
Another option is to work with lists of lists. Those are more tolerant of becoming ragged.
Here's an example of using a boolean mask to remove selected elements from an array (making a copy of course):
In [100]: x=np.arange(1,7).reshape(2,3)
In [101]: x
Out[101]:
array([[1, 2, 3],
[4, 5, 6]])
In [102]: mask=np.ones_like(x,bool)
In [103]: mask
Out[103]:
array([[ True, True, True],
[ True, True, True]], dtype=bool)
In [104]: mask[0,0]=False
In [105]: mask[1,1]=False
In [106]: mask
Out[106]:
array([[False, True, True],
[ True, False, True]], dtype=bool)
In [107]: x[mask]
Out[107]: array([2, 3, 4, 6]) # it's flat
In [108]: x[mask].reshape(2,2)
Out[108]:
array([[2, 3],
[4, 6]])
Notice that even though both x and mask are 2d, the indexing result is flattened. Such a mask could easily have produced an array that couldn't be reshape back to 2d.
Each row in your matrix is a single dimensional array.
import numpy
ary=numpy.array([[1,2,3],[4,5,6]])
print ary[0]
Gives
array([1, 2, 3])
You could iterate over your matrix, using the values from you seam to remove an element from the current row. Append the result to a modified matrix you are building.
seam = numpy.array([1,2])
for i in range(2):
tmp = numpy.delete(ary[i],seam[i]-1)
if i == 0:
modified_ary = tmp
else:
modified_ary = numpy.vstack((modified_ary,tmp))
print modified_ary
Gives
[[2 3]
[4 6]]

How to vectorize a set of items in python

I am trying to take a set of arrays and convert them into a matrix that will essentially be an indicator matrix for a set of items.
I currently have a array of N items
A_ = [A,B,C,D,E,...,Y,Z]
In addition, I have S arrays (currently stored in an array) that are have a subset of the items in vector A.
B_ = [A,B,C,Z]
C_ = [A,B]
D_ = [D,Y,Z]
The array they are stored in would is structures like so:
X = [B_,C_,D_]
I would like to convert the data into an indicator matrix for easier operation. It would ideally look like this (it would be an N x S sized matrix):
[1,1,1,0,...,0,1]
[1,1,0,0,...,0,0]
[0,0,0,1,...,1,1]
I know how I could use a for loop to iterate through this and create the matrix but I was wondering if there is a more efficient/syntactically simple way of going about this.
A concise way would be to use a list comprehension.
# Create a list containing the alphabet using a list comprehension
A_ = [chr(i) for i in range(65,91)]
# A list containing two sub-lists with some letters
M = [["A","B","C","Z"],["A","B","G"]]
# Nested list comprehension to convert character matrix
# into matrix of indicator vectors
I_M = [[1 if char in sublist else 0 for char in A_] for sublist in M]
The last line is a bit dense if you aren't familiar with comprehensions, but its not too tricky once you take it apart. The inner part...
[1 if char in sublist else 0 for char in A_]
Is a list comprehension in itself, which creates a list containing 1's for all characters (char) in A_ which are also found in sublist, and 0's for characters not found in sublist.
The outer bit...
[ ... for sublist in M]
simply runs the inner list comprehension for each sublist found in M, resulting in a list of all the sublists created by the inner list comprehension stored in I_M.
Edit:
While I tried to keep this example simple, it is worth noting (as DSM and jterrace point out) that testing membership in vanilla arrays is O(N). Converting it to a hashlike structure like a Set would speed up the checking for large sublists.
Using numpy:
>>> import numpy as np
>>> A_ = np.array(['A','B','C','D','E','Y','Z'])
>>> B_ = np.array(['A','B','C','Z'])
>>> C_ = np.array(['A','B'])
>>> D_ = np.array(['D','Y','Z'])
>>> X = [B_,C_,D_]
>>> matrix = np.array([np.in1d(A_, x) for x in X])
>>> matrix.shape
(3, 7)
>>> matrix
array([[ True, True, True, False, False, False, True],
[ True, True, False, False, False, False, False],
[False, False, False, True, False, True, True]], dtype=bool)
This is O(NS).

Categories

Resources