numpy - select by multiple equalities dynamically - python

I want to select elements of an array according to multiple conditions but dynamically.
I'd define them as follows:
L = [1,2,5]
X = np.random.choice(10, size=(15,))
X[X in L]
I know I can do it as X[(X==1)|(X==2)|(X==5)] but my question goes for dynamically changing L, suppose it is an arbitrary list of integers.
index = np.zeros_like(X, dtype=np.bool)
for i in L:
index[i] = 1
X[index]
Is there a better way to perform this?

You want to get all elements from the input list X, which also belong to L.
You can use numpy.isin(...):
X[np.isin(X, L)]
np.isin(X, L) will in essence return array of Booleans for each element of X having True if it belongs to L and False otherwise.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.isin.html

Related

Finding the indices of distinct elements in a vectorized manner

I have a list of ints, a, between 0 and 3000. len(a) = 3000. I have a for loop that is iterating through this list, searching for the indices of each elemenent in a larger array.
import numpy as np
a = [i for i in range(3000)]
array = np.random.randint(0, 3000, size(12, 1000, 1000))
newlist = []
for i in range(0, len(a)):
coord = np.where(array == list[i])
newlist.append(coord)
As you can see, coord will be 3 arrays of the coordinates x, y, z for the values in the 3D matrix that equal the value in the list.
Is there a way to do this in a vectorized manner without the for loop?
The output should be a list of tuples, one for each element in a:
# each coord looks like this:
print(coord)
(array[1, ..., 1000], array[2, ..., 1000], array[2, ..., 12])
# combined over all the iterations:
print(newlist)
[coord1, coord2, ..., coord3000]
There is actually a fully vectorized solution to this, despite the fact that the resulting arrays are all of different sizes. The idea is this:
Sort all the elements of the array along with their coordinates. argsort is ideal for this sort of thing.
Find the cut-points in the sorted data, so you know where to split the array, e.g. with diff and flatnonzero.
split the coordinate array along the indices you found. If you have missing elements, you may need to generate a key based on the first element of each run.
Here is an example to walk you through it. Let's say you have an d-dimensional array with size n. Your coordinates will be a (d, n) array:
d = arr.ndim
n = arr.size
You can generate the coordinate arrays with np.indices directly:
coords = np.indices(arr.shape)
Now ravel/reshape the data and the coordinates into an (n,) and (d, n) array, respectively:
arr = arr.ravel() # Ravel guarantees C-order no matter the source of the data
coords = coords.reshape(d, n) # C-order by default as a result of `indices` too
Now sort the data:
order = np.argsort(arr)
arr = arr[order]
coords = coords[:, order]
Find the locations where the data changes values. You want the indices of the new values, so we can make a fake first element that is 1 less than the actual first element.
change = np.diff(arr, prepend=arr[0] - 1)
The indices of the locations give the break-points in the array:
locs = np.flatnonzero(change)
You can now split the data at those locations:
result = np.split(coords, locs[1:], axis=1)
And you can create the key of values actually found:
key = arr[locs]
If you are very confident that all the values are present in the array, then you don't need the key. Instead, you can compute locs as just np.diff(arr) and result as just np.split(coords, inds, axis=1).
Each element in result is already consistent with the indexing used by where/nonzero, but as a numpy array. If specifically want a tuple, you can map it to a tuple:
result = [tuple(inds) for inds in result]
TL;DR
Combining all this into a function:
def find_locations(arr):
coords = np.indices(arr.shape).reshape(arr.ndim, arr.size)
arr = arr.ravel()
order = np.argsort(arr)
arr = arr[order]
coords = coords[:, order]
locs = np.flatnonzero(np.diff(arr, prepend=arr[0] - 1))
return arr[locs], np.split(coords, locs[1:], axis=1)
You can return a list of index arrays with empty arrays for missing elements by replacing the last line with
result = [np.empty(0, dtype=int)] * 3000 # Empty array, so OK to use same reference
for i, j in enumerate(arr[locs]):
result[j] = coords[i]
return result
You can optionally filter for values that are in the specific range you want (e.g. 0-2999).
You can use logical OR in numpy to pass all those equality conditions at once instead of one by one.
import numpy as np
conditions = False
for i in list:
conditions = np.logical_or(conditions,array3d == i)
newlist = np.where(conditions)
This allows numpy to do filtering once instead of n passes for each condition separately.
Another way to do it more compactly
np.where(np.isin(array3d, list))

python find permutation mapping [duplicate]

I have two 1D arrays, x & y, one smaller than the other. I'm trying to find the index of every element of y in x.
I've found two naive ways to do this, the first is slow, and the second memory-intensive.
The slow way
indices= []
for iy in y:
indices += np.where(x==iy)[0][0]
The memory hog
xe = np.outer([1,]*len(x), y)
ye = np.outer(x, [1,]*len(y))
junk, indices = np.where(np.equal(xe, ye))
Is there a faster way or less memory intensive approach? Ideally the search would take advantage of the fact that we are searching for not one thing in a list, but many things, and thus is slightly more amenable to parallelization.
Bonus points if you don't assume that every element of y is actually in x.
I want to suggest one-line solution:
indices = np.where(np.in1d(x, y))[0]
The result is an array with indices for x array which corresponds to elements from y which were found in x.
One can use it without numpy.where if needs.
As Joe Kington said, searchsorted() can search element very quickly. To deal with elements that are not in x, you can check the searched result with original y, and create a masked array:
import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])
index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)
yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y
result = np.ma.array(yindex, mask=mask)
print result
the result is:
[-- 3 1 -- -- 6]
How about this?
It does assume that every element of y is in x, (and will return results even for elements that aren't!) but it is much faster.
import numpy as np
# Generate some example data...
x = np.arange(1000)
np.random.shuffle(x)
y = np.arange(100)
# Actually preform the operation...
xsorted = np.argsort(x)
ypos = np.searchsorted(x[xsorted], y)
indices = xsorted[ypos]
I think this is a clearer version:
np.where(y.reshape(y.size, 1) == x)[1]
than indices = np.where(y[:, None] == x[None, :])[1]. You don't need to broadcast x into 2D.
This type of solution I found to be best because unlike searchsorted() or in1d() based solutions that have seen posted here or elsewhere, the above works with duplicates and it doesn't care if anything is sorted. This was important to me because I wanted x to be in a particular custom order.
I would just do this:
indices = np.where(y[:, None] == x[None, :])[1]
Unlike your memory-hog way, this makes use of broadcast to directly generate 2D boolean array without creating 2D arrays for both x and y.
The numpy_indexed package (disclaimer: I am its author) contains a function that does exactly this:
import numpy_indexed as npi
indices = npi.indices(x, y, missing='mask')
It will currently raise a KeyError if not all elements in y are present in x; but perhaps I should add a kwarg so that one can elect to mark such items with a -1 or something.
It should have the same efficiency as the currently accepted answer, since the implementation is along similar lines. numpy_indexed is however more flexible, and also allows to search for indices of rows of multidimensional arrays, for instance.
EDIT: ive changed the handling of missing values; the 'missing' kwarg can now be set with 'raise', 'ignore' or 'mask'. In the latter case you get a masked array of the same length of y, on which you can call .compressed() to get the valid indices. Note that there is also npi.contains(x, y) if this is all you need to know.
Another solution would be:
a = np.array(['Bob', 'Alice', 'John', 'Jack', 'Brian', 'Dylan',])
z = ['Bob', 'Brian', 'John']
for i in z:
print(np.argwhere(i==a))
Use this line of code :-
indices = np.where(y[:, None] == x[None, :])[1]
My solution can additionally handle a multidimensional x. By default, it will return a standard numpy array of corresponding y indices in the shape of x.
If you can't assume that y is a subset of x, then set masked=True to return a masked array (this has a performance penalty). Otherwise, you will still get indices for elements not contained in y, but they probably won't be useful to you.
The answers by HYRY and Joe Kington were helpful in making this.
# For each element of ndarray x, return index of corresponding element in 1d array y
# If y contains duplicates, the index of the last duplicate is returned
# Optionally, mask indices where the x element does not exist in y
def matched_indices(x, y, masked=False):
# Flattened x
x_flat = x.ravel()
# Indices to sort y
y_argsort = y.argsort()
# Indices in sorted y of corresponding x elements, flat
x_in_y_sort_flat = y.searchsorted(x_flat, sorter=y_argsort)
# Indices in y of corresponding x elements, flat
x_in_y_flat = y_argsort[x_in_y_sort_flat]
if not masked:
# Reshape to shape of x
return x_in_y_flat.reshape(x.shape)
else:
# Check for inequality at each y index to mask invalid indices
mask = x_flat != y[x_in_y_flat]
# Reshape to shape of x
return np.ma.array(x_in_y_flat.reshape(x.shape), mask=mask.reshape(x.shape))
A more direct solution, that doesn't expect the array to be sorted.
import pandas as pd
A = pd.Series(['amsterdam', 'delhi', 'chromepet', 'tokyo', 'others'])
B = pd.Series(['chromepet', 'tokyo', 'tokyo', 'delhi', 'others'])
# Find index position of B's items in A
B.map(lambda x: np.where(A==x)[0][0]).tolist()
Result is:
[2, 3, 3, 1, 4]
more compact solution:
indices, = np.in1d(a, b).nonzero()

Change the first column of the matrix from another specified matrix

I have 2 matrices
x = [[1,2,3],
[4,5,6],
[7,8,9]]
y = [0,2,4]
and i want to change each first element from each row of matrix x using each element from matrix y so the end result would be
x = [[0,2,3],
[2,5,6],
[4,8,9]]
i have tried this code
x = [[1,2,3],[4,5,6],[7,8,9]]
y = [0,2,4]
for i in range (len(x)):
x[i][0] = y[0][i]
print (x)
but it only returns "TypeError: 'int' object is not subscriptable"
are there any ways to fix this and how do you expand this so that it's appliable to any n*n matrix?
Change x[i][0] = y[0][i] to x[i][0] = y[i].
Another way to do this with fewer indices:
x = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
y = [0, 2, 4]
for x_row, y_int in zip(x, y):
x_row[0] = y_int
print(x)
You do not have matrices. x is a list of lists and y is a list. They can represent matrices/vectors/etc., but those are ultimately mathematical abstractions, which can be implemented in code in different ways.
The first way to do it, maintaining the structure of your code, requires taking note of the above fact: as y is a list containing ints, y[0][i] will clearly not work, since y[0] will always be an int, and you cannot further apply the subscript operator to ints.
Accordingly, this will work:
for i in range (len(x)):
x[i][0] = y[i]
That said, that is not the only way to do it. If you desired a more functional approach, you could do something like this list comprehension:
[[y_value, *x_value[1:]] for x_value, y_value in zip(x, y)]
This gives the same result, but approaches the problem in a more abstract way: the new list will itself contain lists where the first element comes from y and the rest from x. Understanding this, we can instead compose an inner list following this pattern.
zip creates an iterator of pairs of values from x and y. Using this iterator, each value from y can be positioned before each value from x in a list. Lastly, since x_value is a list, it must be unpacked so we get, for example, [0, 2, 4] instead of [0, [2, 4]].
With Python, you would typically avoid using indexes when possible. Since a "matrix" is actually a list of lists, going through rows returns lists which are object references that you can manipulate directly:
for row,value in zip(x,y): row[0] = value

Extract indices of specified elements in a list

I have a list. Each element is a real-value integer, and I want to extract the indices of specified element. For example:
import numpy as np
idx = np.where(A==1) #A is a list of [1,1,2,3,4,5....]
But np.where seems not to work for a list.
My next task is to obtain a new list from another list, B, based on the obtained indices:
C = B[idx]
Convert the list A to an ndarray and it should work
idx = np.where(np.array(A)==1)
C = [B[i] for i in idx[0]]
There is no need of numpy IMO. You can create B simply by using something like:
B = [ele for ele in A if ele == 1]
If A is a vanilla list (a default list in Python), then Python will interpret:
A == 1
as checking whether the list is equal to 1. Which is of course not true.
You should turn A into an array:
Aa = np.array(A) # construct a numpy array
idx = np.where(Aa == 1) # obtain the indices
B = Aa[idx] # make a copy (again on the numpy array)

List comprehension with condition on the new element

I will be stealing the form of the question there: List comprehension with condition
I have a simple list.
>>> a = [0, 1, 2]
I want to make a new list from it using a list comprehension.
>>> b = [afunction(x) for x in a]
>>> b
[12458, None, 34745]
Pretty simple, but what if I want to operate only over non-None elements? I can do this:
>>> b = [y for y in [afunction(x) for x in a] if y != None]
I would rather like to be able to do it with single list comprehension, in a single run. This, I believe iterates over the list again to filter out the Nones, and nobody really likes that.
You don't need to build the list then iterate over it again, you can consume an iterator of the application of afunction to the elements in a, either using map:
[y for y in map(afunction, a) if y is not None]
or a generator expression (note change in the type of brackets):
[y for y in (afunction(x) for x in a) if y is not None]
# ^ here ^ and here
Note also that these test for None by identity, not equality, per PEP-8.

Categories

Resources