Slice array based by providing string ':' as index in Numpy - python

I am in a position to extract whole data from an array. The simplest method would be simply passing array[:]. However, I want to make it automated as part of the larger project where the index would be varying with the data format. Therefore, is it possible to extract the whole data by passing a slicing string ":" as an index to the array?
To make things clear, here is an example of what I am trying to do.
create an array:
>>> import numpy as np
>>> a = np.random.randint(0,10,(5,5))
>>> a
array([[3, 3, 3, 7, 2],
[8, 6, 8, 6, 3],
[4, 2, 2, 0, 3],
[4, 0, 6, 0, 1],
[1, 2, 0, 2, 8]])
General slicing of dataset using tradition method:
>>> a[:]
array([[3, 3, 3, 7, 2],
[8, 6, 8, 6, 3],
[4, 2, 2, 0, 3],
[4, 0, 6, 0, 1],
[1, 2, 0, 2, 8]])
It works. However, I intend to make : as a variable and try to extract with above example like below:
>>> b = ":"
>>> a[b]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
As per the printed error, I did some correction while defining variable as indicated below which also resulted in error:
>>> c = slice(':')
>>> a[c]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: slice indices must be integers or None or have an __index__ method
So, is it possible at all to extract data by passing a slicing string ":" as an index to an array?
Update
Thank you all for your comments. It is possible with following method:
>>> c = np.index_exp[:]
>>> a[c]
array([[3, 3, 3, 7, 2],
[8, 6, 8, 6, 3],
[4, 2, 2, 0, 3],
[4, 0, 6, 0, 1],
[1, 2, 0, 2, 8]])

Related

Address of last value in 1d NumPy array

I have a 1d array with zeros scattered throughout. Would like to create a second array which contains the position of the last zero, like so:
>>> a = np.array([1, 0, 3, 2, 0, 3, 5, 8, 0, 7, 12])
>>> foo(a)
[0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3]
Is there a built-in NumPy function or broadcasting trick to do this without using a for loop or other iterator?
>>> (a == 0).cumsum()
array([0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])

Is there a way to generate a list of indices using numpy

Can I use numpy to generate repeating patterns of indices for example.
0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14, 15
or
0,1,2,1,2,3,4,5,6,5,6,7
Is there a method in numpy i can use to generate these lists between a range ?
currently I am doing this using lists in python but I was curious if I could use numpy to speed things up.
I am not sure what methods to even look into other than numpy.arange.
Just to further clarify I am generating indices to triangles in opengl in various patterns.
so for traingles in a circle I have some code like this.
for fan_set in range(0, len(self.vertices) / vertex_length, triangle_count):
for i in range(fan_set + 1, fan_set + 8):
self.indices.append(fan_set)
self.indices.append(i)
self.indices.append(i + 1)
Your first example can be produced via numpy methods as:
In [860]: np.concatenate((np.zeros((3,1),int),np.arange(1,16).reshape(3,5)),axis=1).ravel()
Out[860]:
array([ 0, 1, 2, 3, 4, 5, 0, 6, 7, 8, 9, 10, 0, 11, 12, 13, 14,
15])
That's because I see this 2d repeated pattern
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 6, 7, 8, 9, 10],
[ 0, 11, 12, 13, 14, 15]])
The second pattern can be produced by ravel of this 2d array (produced by broadcasting 2 arrays):
In [863]: np.array([0,1,4,5])[:,None]+np.arange(3)
Out[863]:
array([[0, 1, 2],
[1, 2, 3],
[4, 5, 6],
[5, 6, 7]])
I can produce the 1st pattern with a variation on the 2nd (the initial column of 0s disrupts the pattern)
I=np.array([0,5,10])[:,None]+np.arange(0,6)
I[:,0]=0
I think your double loop can be expressed as a list comprehension as
In [872]: np.array([ [k,i,i+1] for k in range(0,1,1) for i in range(k+1,k+8)]).ravel()
Out[872]: array([0, 1, 2, 0, 2, 3, 0, 3, 4, 0, 4, 5, 0, 5, 6, 0, 6, 7, 0, 7, 8])
or without the ravel:
array([[0, 1, 2],
[0, 2, 3],
[0, 3, 4],
[0, 4, 5],
[0, 5, 6],
[0, 6, 7],
[0, 7, 8]])
though I don't know what parameters produce your examples.
I'm not sure I understand exactly what you mean, but the following is what I use to generate unique indices for 3D points;
def indexate(points):
"""
Convert a numpy array of points into a list of indices and an array of
unique points.
Arguments:
points: A numpy array of shape (N, 3).
Returns:
An array of indices and an (M, 3) array of unique points.
"""
pd = {}
indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
unique = np.array([i[1] for i in pt])
return np.array(indices, np.uint16), unique
You can find this code in my stltools package on github.
It works like this;
In [1]: import numpy as np
In [2]: points = np.array([[1,0,0], [0,0,1], [1,0,0], [0,1,0]])
In [3]: pd = {}
In [4]: indices = [pd.setdefault(tuple(p), len(pd)) for p in points]
In [5]: indices
Out[5]: [0, 1, 0, 2]
In [6]: pt = sorted([(v, k) for k, v in pd.items()], key=lambda x: x[0])
In [7]: pt
Out[7]: [(0, (1, 0, 0)), (1, (0, 0, 1)), (2, (0, 1, 0))]
In [8]: unique = np.array([i[1] for i in pt])
In [9]: unique
Out[9]:
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]])
The key point (if you'll pardon the pun) is to use a tuple of the point (because a tuple is immutable and thus hashable) as the key in a dictionary with the setdefault method, while the length of the dict is the value. In effect, the value is the first time this exact point was seen.
I am not 100% certain this is what you're after, I think you can achieve this using pair of range values and increment n times 3 (the gap between each group), then use numpy.concatenate to concatenate the final array, like this:
import numpy as np
def gen_list(n):
return np.concatenate([np.array(range(i, i+3) + range(i+1, i+4)) + i*3
for i in xrange(n)])
Usage:
gen_list(2)
Out[16]: array([0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7])
gen_list(3)
Out[17]:
array([ 0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7, 8, 9, 10, 9, 10,
11])
list(gen_list(2))
Out[18]: [0, 1, 2, 1, 2, 3, 4, 5, 6, 5, 6, 7]
In my sample I only use n as how many groups you want to generate, you may change this to suit your triangle-ish requirements.

Slicing a 2D numpy array in python

What's wrong with the code below?
arr=numpy.empty((2,2))
arr[0:,0:]=1
print(arr[1:,1:])
arr=([ [1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ])
print(arr[1:2, 1])
I am getting the following error and not able to slice the array( fifth line). Please help me with this.
TypeError: list indices must be integers, not tuple.
You rebind the name arr to point to a Python list in your fourth line, and so your question title doesn't quite fit: you're not slicing a 2d numpy array. lists can't be sliced the way that numpy arrays can. Compare:
>>> arr= numpy.array([ [1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ])
>>> arr
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> arr[1:2, 1]
array([5])
but
>>> arr.tolist()
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> arr.tolist()[1:2, 1]
Traceback (most recent call last):
File "<ipython-input-23-4a441cf2eaa9>", line 1, in <module>
arr.tolist()[1:2, 1]
TypeError: list indices must be integers, not tuple
arr=([ [1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ]) is a python list,not a numpy array.
You reassign arr with arr=([ [1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ]) to a list.
Make it a numpy array:
In [37]: arr = numpy.array([ [1, 2, 3], [ 4, 5, 6], [ 7, 8, 9] ])
In [38]: arr
Out[38]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [39]: (arr[1:2, 1])
Out[39]: array([5])

Replace subarrays in numpy

Given an array,
>>> n = 2
>>> a = numpy.array([[[1,1,1],[1,2,3],[1,3,4]]]*n)
>>> a
array([[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]],
[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]]])
I know that it's possible to replace values in it succinctly like so,
>>> a[a==2] = 0
>>> a
array([[[1, 1, 1],
[1, 0, 3],
[1, 3, 4]],
[[1, 1, 1],
[1, 0, 3],
[1, 3, 4]]])
Is it possible to do the same for an entire row (last axis) in the array? I know that a[a==[1,2,3]] = 11 will work and replace all the elements of the matching subarrays with 11, but I'd like to substitute a different subarray. My intuition tells me to write the following, but an error results,
>>> a[a==[1,2,3]] = [11,22,33]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: array is not broadcastable to correct shape
In summary, what I'd like to get is:
array([[[1, 1, 1],
[11, 22, 33],
[1, 3, 4]],
[[1, 1, 1],
[11, 22, 33],
[1, 3, 4]]])
... and n of course is, in general, a lot larger than 2, and the other axes are also larger than 3, so I don't want to loop over them if I don't need to.
Update: The [1,2,3] (or whatever else I'm looking for) is not always at index 1. An example:
a = numpy.array([[[1,1,1],[1,2,3],[1,3,4]], [[1,2,3],[1,1,1],[1,3,4]]])
You can achieve this with a much higher performance using np.all to check if all the columns have a True value for your comparison, then using the created mask to replace the values:
mask = np.all(a==[1,2,3], axis=2)
a[mask] = [11, 22, 23]
print(a)
#array([[[ 1, 1, 1],
# [11, 22, 33],
# [ 1, 3, 4]],
#
# [[ 1, 1, 1],
# [11, 22, 33],
# [ 1, 3, 4]]])
You have to do something a little more complicated to acheive what you want.
You can't select slices of arrays as such, but you can select all the specific indexes you want.
So first you need to construct an array that represents the rows you wish to select. ie.
data = numpy.array([[1,2,3],[55,56,57],[1,2,3]])
to_select = numpy.array([1,2,3]*3).reshape(3,3) # three rows of [1,2,3]
selected_indices = data == to_select
# array([[ True, True, True],
# [False, False, False],
# [ True, True, True]], dtype=bool)
data = numpy.where(selected_indices, [4,5,6], data)
# array([[4, 5, 6],
# [55, 56, 57],
# [4, 5, 6]])
# done in one step, but perhaps not very clear as to its intent
data = numpy.where(data == numpy.array([1,2,3]*3).reshape(3,3), [4,5,6], data)
numpy.where works by selecting from the second argument if true and the third argument if false.
You can use where to select from 3 different types of data. The first is an array that has the same shape as selected_indices, the second is just a value on its own (like 2 or 7). The first is most complicated as can be of shape that can be broadcast into the same shape as selected_indices. In this case we provided [1,2,3] which can be stacked together to get an array with shape 3x3.
Note sure if this is what you want, your code example does not create the array you say it does. But:
>>> a = np.array([[[1,1,1],[1,2,3],[1,3,4]], [[1,1,1],[1,2,3],[1,3,4]]])
>>> a
array([[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]],
[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]]])
>>> a[:,1,:] = [[8, 8, 8], [8,8,8]]
>>> a
array([[[1, 1, 1],
[8, 8, 8],
[1, 3, 4]],
[[1, 1, 1],
[8, 8, 8],
[1, 3, 4]]])
>>> a[:,1,:] = [88, 88, 88]
>>> a
array([[[ 1, 1, 1],
[88, 88, 88],
[ 1, 3, 4]],
[[ 1, 1, 1],
[88, 88, 88],
[ 1, 3, 4]]])

Removing duplicate columns and rows from a NumPy 2D array

I'm using a 2D shape array to store pairs of longitudes+latitudes. At one point, I have to merge two of these 2D arrays, and then remove any duplicated entry. I've been searching for a function similar to numpy.unique, but I've had no luck. Any implementation I've been
thinking on looks very "unoptimizied". For example, I'm trying with converting the array to a list of tuples, removing duplicates with set, and then converting to an array again:
coordskeys = np.array(list(set([tuple(x) for x in coordskeys])))
Are there any existing solutions, so I do not reinvent the wheel?
To make it clear, I'm looking for:
>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1], [2, 3],[5, 4]])
BTW, I wanted to use just a list of tuples for it, but the lists were so big that they consumed my 4Gb RAM + 4Gb swap (numpy arrays are more memory efficient).
This should do the trick:
def unique_rows(a):
a = np.ascontiguousarray(a)
unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
Example:
>>> a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
>>> unique_rows(a)
array([[1, 1],
[2, 3],
[5, 4]])
Here's one idea, it'll take a little bit of work but could be quite fast. I'll give you the 1d case and let you figure out how to extend it to 2d. The following function finds the unique elements of of a 1d array:
import numpy as np
def unique(a):
a = np.sort(a)
b = np.diff(a)
b = np.r_[1, b]
return a[b != 0]
Now to extend it to 2d you need to change two things. You will need to figure out how to do the sort yourself, the important thing about the sort will be that two identical entries end up next to each other. Second, you'll need to do something like (b != 0).all(axis) because you want to compare the whole row/column. Let me know if that's enough to get you started.
updated: With some help with doug, I think this should work for the 2d case.
import numpy as np
def unique(a):
order = np.lexsort(a.T)
a = a[order]
diff = np.diff(a, axis=0)
ui = np.ones(len(a), 'bool')
ui[1:] = (diff != 0).any(axis=1)
return a[ui]
My method is by turning a 2d array into 1d complex array, where the real part is 1st column, imaginary part is the 2nd column. Then use np.unique. Though this will only work with 2 columns.
import numpy as np
def unique2d(a):
x, y = a.T
b = x + y*1.0j
idx = np.unique(b,return_index=True)[1]
return a[idx]
Example -
a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
unique2d(a)
array([[1, 1],
[2, 3],
[5, 4]])
>>> import numpy as NP
>>> # create a 2D NumPy array with some duplicate rows
>>> A
array([[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8],
[5, 4, 5, 4, 7],
[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8]])
>>> # first, sort the 2D NumPy array row-wise so dups will be contiguous
>>> # and rows are preserved
>>> a, b, c, d, e = A.T # create the keys for to pass to lexsort
>>> ndx = NP.lexsort((a, b, c, d, e))
>>> ndx
array([1, 3, 5, 7, 0, 4, 2, 6, 8])
>>> A = A[ndx,]
>>> # now diff by row
>>> A1 = NP.diff(A, axis=0)
>>> A1
array([[0, 0, 0, 0, 0],
[4, 3, 3, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[2, 5, 0, 2, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> # the index array holding the location of each duplicate row
>>> ndx = NP.any(A1, axis=1)
>>> ndx
array([False, True, False, True, True, True, False, False], dtype=bool)
>>> # retrieve the duplicate rows:
>>> A[1:,:][ndx,]
array([[7, 9, 4, 7, 8],
[1, 1, 1, 5, 7],
[5, 4, 5, 4, 7],
[7, 9, 4, 7, 8]])
The numpy_indexed package (disclaimer: I am its author) wraps the solution posted by user545424 in a nice and tested interface, plus many related features:
import numpy_indexed as npi
npi.unique(coordskeys)
since you refer to numpy.unique, you dont care to maintain the original order, correct? converting into set, which removes duplicate, and then back to list is often used idiom:
>>> x = [(1, 1), (2, 3), (1, 1), (5, 4), (2, 3)]
>>> y = list(set(x))
>>> y
[(5, 4), (2, 3), (1, 1)]
>>>

Categories

Resources