delete all columns of a dimension except for a specific column - python

I want to make a function which takes a n-dimensional array, the dimension and the column index, and it will return the (n-1)-dimensional array after removing all the other columns of that specific dimension.
Here is the code I am using now
a = np.arange(6).reshape((2, 3)) # the n-dimensional array
axisApplied = 1
colToKeep = 0
colsToDelete = np.delete(np.arange(a.shape[axisApplied]), colToKeep)
a = np.squeeze(np.delete(a, colsToDelete, axisApplied), axis=axisApplied)
print(a)
# [0, 3]
Note that I have to manually calculate the n-1 indices (the complement of the specific column index) to use np.delete(), and I am wondering whether there is a more convenient way to achieve my goal, e.g. specify which column to keep directly.
Thank you for reading and I am welcome to any suggestions.

In [1]: arr = np.arange(6).reshape(2,3)
In [2]: arr
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
Simple indexing:
In [3]: arr[:,0]
Out[3]: array([0, 3])
Or if you need to used the general axis parameter, try take:
In [4]: np.take(arr,0,axis=1)
Out[4]: array([0, 3])
Picking one element, or a list of elements, along an axis is a lot easier than deleting some. Look at the code for np.delete.

Related

Most efficient way to implement numpy.in1d for muliple arrays

What is the best way to implement a function which takes an arbitrary number of 1d arrays and returns a tuple containing the indices of the matching values (if any).
Here is some pseudo-code of what I want to do:
a = np.array([1, 0, 4, 3, 2])
b = np.array([1, 2, 3, 4, 5])
c = np.array([4, 2])
(ind_a, ind_b, ind_c) = return_equals(a, b, c)
# ind_a = [2, 4]
# ind_b = [1, 3]
# ind_c = [0, 1]
(ind_a, ind_b, ind_c) = return_equals(a, b, c, sorted_by=a)
# ind_a = [2, 4]
# ind_b = [3, 1]
# ind_c = [0, 1]
def return_equals(*args, sorted_by=None):
...
You can use numpy.intersect1d with reduce for this:
def return_equals(*arrays):
matched = reduce(np.intersect1d, arrays)
return np.array([np.where(np.in1d(array, matched))[0] for array in arrays])
reduce may be little slow here because we are creating intermediate NumPy arrays here(for large number of input it may be very slow), we can prevent this if we use Python's set and its .intersection() method:
matched = np.array(list(set(arrays[0]).intersection(*arrays[1:])))
Related GitHub ticket: n-array versions of set operations, especially intersect1d
This solution basically concatenates all input 1D arrays into one big 1D array with the intention of performing the required operations in a vectorized manner. The only place where it uses loop is at the start where it gets the lengths of the input arrays, which must be minimal on runtime costs.
Here's the function implementation -
import numpy as np
def return_equals(*argv):
# Concatenate input arrays into one big array for vectorized processing
A = np.concatenate((argv[:]))
# lengths of input arrays
narr = len(argv)
lens = np.zeros((1,narr),int).ravel()
for i in range(narr):
lens[i] = len(argv[i])
N = A.size
# Start indices of each group of identical elements from different input arrays
# in a sorted version of the huge concatenated input array
start_idx = np.where(np.append([True],np.diff(np.sort(A))!=0))[0]
# Runlengths of islands of identical elements
runlens = np.diff(np.append(start_idx,N))
# Starting and all indices of the positions in concatenate array that has
# islands of identical elements which are present across all input arrays
good_start_idx = start_idx[runlens==narr]
good_all_idx = good_start_idx[:,None] + np.arange(narr)
# Get offsetted indices and sort them to get the desired output
idx = np.argsort(A)[good_all_idx] - np.append([0],lens[:-1].cumsum())
return np.sort(idx.T,1)
In Python:
def return_equal(*args):
rtr=[]
for i, arr in enumerate(args):
rtr.append([j for j, e in enumerate(arr) if
all(e in a for a in args[0:i]) and
all(e in a for a in args[i+1:])])
return rtr
>>> return_equal(a,b,c)
[[2, 4], [1, 3], [0, 1]]
For start, I'd try:
def return_equals(*args):
x=[]
c=args[-1]
for a in args:
x.append(np.nonzero(np.in1d(a,c))[0])
return x
If I add a d=np.array([1,0,4,3,0]) (it has only 1 match; what if there are no matches?)
then
return_equals(a,b,d,c)
produces:
[array([2, 4], dtype=int32),
array([1, 3], dtype=int32),
array([2], dtype=int32),
array([0, 1], dtype=int32)]
Since the length of both input and returned arrays can differ, you really can't vectorize the problem. That is, it takes some special gymnastics to perform the operation across all inputs at once. And if the number of arrays is small compared to their typical length, I wouldn't worry about speed. Iterating a few times is not expensive. It's iterating over a 100 values that's expensive.
You could, of course, pass the keyword arguments on to in1d.
It's not clear what you are trying to do with the sorted_by parameter. Is that something that you could just as easily apply to the arrays before you pass them to this function?
List comprehension version of this iteration:
[np.nonzero(np.in1d(x,c))[0] for x in [a,b,d,c]]
I can imagine concatenating the arrays into one longer one, applying in1d, and then splitting it up into subarrays. There is a np.split, but it requires that you tell it how many elements to put in each sublist. That means, somehow, determining how many matches there are for each argument. Doing that without looping could be tricky.
The pieces for this (that still need to be packed as function) are:
args=[a,b,d,c]
lens=[len(x) for x in args]
abc=np.concatenate(args)
C=np.cumsum(lens)
I=np.nonzero(np.in1d(abc,c))[0]
S=np.split(I,(2,4,5))
[S[0],S[1]-C[0],S[2]-C[1],S[3]-C[2]]
I
# array([ 2, 4, 6, 8, 12, 15, 16], dtype=int32)
C
# array([ 5, 10, 15, 17], dtype=int32)
The (2,4,5) are the number of elements of I between successive values of C, i.e. the number of elements that match for each of a,b,...

How do I remove the first and last rows and columns from a 2D numpy array?

I'd like to know how to remove the first and last rows and columns from a 2D array in numpy. For example, say we have a (N+1) x (N+1) matrix called H then in MATLAB/Octave, the code I'd use would be:
Hsub = H(2:N,2:N);
What's the equivalent code in Numpy? I thought that np.reshape might do what I want but I'm not sure how to get it to remove just the target rows as I think if I reshape to a (N-1) x (N-1) matrix, it'll remove the last two rows and columns.
How about this?
Hsub = H[1:-1, 1:-1]
The 1:-1 range means that we access elements from the second index, or 1, and we go up to the second last index, as indicated by the -1 for a dimension. We do this for both dimensions independently. When you do this independently for both dimensions, the result is the intersection of how you're accessing each dimension, which is essentially chopping off the first row, first column, last row and last column.
Remember, the ending index is exclusive, so if we did 0:3 for example, we only get the first three elements of a dimension, not four.
Also, negative indices mean that we access the array from the end. -1 is the last value to access in a particular dimension, but because of the exclusivity, we are getting up to the second last element, not the last element. Essentially, this is the same as doing:
Hsub = H[1:H.shape[0]-1, 1:H.shape[1]-1]
... but using negative indices is much more elegant. You also don't have to use the number of rows and columns to extract out what you need. The above syntax is dimension agnostic. However, you need to make sure that the matrix is at least 3 x 3, or you'll get an error.
Small bonus
In MATLAB / Octave, you can achieve the same thing without using the dimensions by:
Hsub = H(2:end-1, 2:end-1);
The end keyword with regards to indexing means to get the last element for a particular dimension.
Example use
Here's an example (using IPython):
In [1]: import numpy as np
In [2]: H = np.meshgrid(np.arange(5), np.arange(5))[0]
In [3]: H
Out[3]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [4]: Hsub = H[1:-1,1:-1]
In [5]: Hsub
Out[5]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
As you can see, the first row, first column, last row and last column have been removed from the source matrix H and the remainder has been placed in the output matrix Hsub.

Finding the minimum value in a numpy array and the corresponding values for the rest of that array's row

Consider the following NumPy array:
a = np.array([[1,4], [2,1],(3,10),(4,8)])
This gives an array that looks like the following:
array([[ 1, 4],
[ 2, 1],
[ 3, 10],
[ 4, 8]])
What I'm trying to do is find the minimum value of the second column (which in this case is 1), and then report the other value of that pair (in this case 2). I've tried using something like argmin, but that gets tripped up by the 1 in the first column.
Is there a way to do this easily? I've also considered sorting the array, but I can't seem to get that to work in a way that keeps the pairs together. The data is being generated by a loop like the following, so if there's a easier way to do this that isn't a numpy array, I'd take that as an answer too:
results = np.zeros((100,2))
# Loop over search range, change kappa each time
for i in range(100):
results[i,0] = function1(x)
results[i,1] = function2(y)
How about
a[np.argmin(a[:, 1]), 0]
Break-down
a. Grab the second column
>>> a[:, 1]
array([ 4, 1, 10, 8])
b. Get the index of the minimum element in the second column
>>> np.argmin(a[:, 1])
1
c. Index a with that to get the corresponding row
>>> a[np.argmin(a[:, 1])]
array([2, 1])
d. And take the first element
>>> a[np.argmin(a[:, 1]), 0]
2
Using np.argmin is probably the best way to tackle this. To do it in pure python, you could use:
min(tuple(r[::-1]) for r in a)[::-1]

Indexing NumPy 2D array with another 2D array

I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.
The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))
I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).
What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])
IMHO, this is simplest variant:
m[np.arange(4), select]
Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])

Python (Numpy) array sorting

I've got this array, named v, of dtype('float64'):
array([[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02]])
... which I've acquired from a file by using the np.loadtxt command. I would like to sort it after the values of the first column, without mixing up the structure that keeps the numbers listed on the same line together. Using v.sort(axis=0) gives me:
array([[ 1.33360000e+05, 8.75886500e+06, 6.19200000e+00],
[ 4.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 9.33350000e+05, 8.75886500e+06, 6.76650000e+02]])
... i.e. places the smallest number of the third column in the first line, etc. I would rather want something like this...
array([[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02]])
... where the elements of each line hasn't been moved relatively to each other.
Try
v[v[:,0].argsort()]
(with v being the array). v[:,0] is the first column, and .argsort() returns the indices that would sort the first column. You then apply this ordering to the whole array using advanced indexing. Note that you get a sorte copy of the array.
The only way I know of to sort the array in place is to use a record dtype:
v.dtype = [("x", float), ("y", float), ("z", float)]
v.shape = v.size
v.sort(order="x")
Alternatively
Try
import numpy as np
order = v[:, 0].argsort()
sorted = np.take(v, order, 0)
'order' has the order of the first row.
and then 'np.take' take the columns their corresponding order.
See the help of 'np.take' as
help(np.take)
take(a, indices, axis=None, out=None,
mode='raise')
Take elements from an array along an axis.
This function does the same thing as "fancy" indexing (indexing arrays
using arrays); however, it can be easier to use if you need elements
along a given axis.
Parameters
----------
a : array_like
The source array.
indices : array_like
The indices of the values to extract.
axis : int, optional
The axis over which to select values. By default, the flattened
input array is used.
out : ndarray, optional
If provided, the result will be placed in this array. It should
be of the appropriate shape and dtype.
mode : {'raise', 'wrap', 'clip'}, optional
Specifies how out-of-bounds indices will behave.
* 'raise' -- raise an error (default)
* 'wrap' -- wrap around
* 'clip' -- clip to the range
'clip' mode means that all indices that are too large are
replaced
by the index that addresses the last element along that axis. Note
that this disables indexing with negative numbers.
Returns
-------
subarray : ndarray
The returned array has the same type as `a`.
See Also
--------
ndarray.take : equivalent method
Examples
--------
>>> a = [4, 3, 5, 7, 6, 8]
>>> indices = [0, 1, 4]
>>> np.take(a, indices)
array([4, 3, 6])
In this example if `a` is an ndarray, "fancy" indexing can be used.
>>> a = np.array(a)
>>> a[indices]
array([4, 3, 6])
If you have instances where v[:,0] has some identical values and you want to secondarily sort on columns 1, 2, etc.., then you'll want to use numpy.lexsort() or numpy.sort(v, order=('col1', 'col2', etc..) but for the order= case, v will need to be a structured array.
An example application of numpy.lexsort() to sort the rows of an array and deals with ties in the first column. Note that lexsort effectively sorts columns and starts with the last column, so you need to reverse the rows of a then take the transpose before the lexsort, and finally transpose the result (you'd have thought this should be easier, but hey!):
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3,4],[1,0,4,1],[0,4,1,1]])
In [3]: a[np.lexsort(np.flip(a, axis=1).T).T]
Out[3]:
array([[0, 4, 1, 1],
[1, 0, 4, 1],
[1, 2, 3, 4]])
In [4]: a
Out[4]:
array([[1, 2, 3, 4],
[1, 0, 4, 1],
[0, 4, 1, 1]])
Thanks go to #Paul for the suggestion to use lexsort.

Categories

Resources