Find indexes of subarray in numpy array

Find indexes of subarray in numpy array - python

I have two numpy arrays, one larger, one smaller:
a = np.array([[0,1,0],[0,0,1],[0,1,1]])
b = np.array([[0],[1]])
Is there a function that I can use to find the indexes of the larger array where there is an in an instance of the smaller?
Ideal result:
instances[0] = [[2, 0], [2, 1]]
instances[1] = [[1, 1], [1,2]]
Many thanks!

As far as I know there is not fast numpy function that will do this, but you can loop through and check pretty quickly.
def find_instances(a,b):
instances = []
for i in range(a.shape[0] - b.shape[0] + 1):
for j in range(a.shape[1] - b.shape[1] + 1):
if np.all(a[i:i+b.shape[0], j:j+b.shape[1]] == b):
instances.append([i,j])
return instances
Here each instance is the spot in the top left corner of a that matches the top left corner of b. Not quite the output you requested but it's easy enough to get the rest of the indices if you really need them from there. Hope that helps!

Related

Split a list into n randomly sized chunks

I am trying to split a list into n sublists where the size of each sublist is random (with at least one entry; assume P>I). I used numpy.split function which works fine but does not satisfy my randomness condition. You may ask which distribution the randomness should follow. I think, it should not matter. I checked several posts which were not equivalent to my post as they were trying to split with almost equally sized chunks. If duplicate, let me know. Here is my approach:
import numpy as np
P = 10
I = 5
mylist = range(1, P + 1)
[list(x) for x in np.split(np.array(mylist), I)]
This approach collapses when P is not divisible by I. Further, it creates equal sized chunks, not probabilistically sized chunks. Another constraint: I do not want to use the package random but I am fine with numpy. Don't ask me why; I wish I had a logical response for it.
Based on the answer provided by the mad scientist, this is the code I tried:
P = 10
I = 5
data = np.arange(P) + 1
indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
result = np.split(data, indices)
result
Output:
[array([1, 2]),
array([3, 4, 5, 6]),
array([], dtype=int32),
array([4, 5, 6, 7, 8, 9]),
array([10])]

The problem can be refactored as choosing I-1 random split points from {1,2,...,P-1}, which can be viewed using stars and bars.
Therefore, it can be implemented as follows:
import numpy as np
split_points = np.random.choice(P - 2, I - 1, replace=False) + 1
split_points.sort()
result = np.split(data, split_points)

np.split is still the way to go. If you pass in a sequence of integers, split will treat them as cut points. Generating random cut points is easy. You can do something like
P = 10
I = 5
data = np.arange(P) + 1
indices = np.random.randint(P, size=I - 1)
You want I - 1 cut points to get I chunks. The indices need to be sorted, and duplicates need to be removed. np.unique does both for you. You may end up with fewer than I chunks this way:
result = np.split(data, indices)
If you absolutely need to have I numbers, choose without resampling. That can be implemented for example via np.shuffle:
indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
indices.sort()

Matlab to Python Matrix Code

I am trying to translate some code from MATLAB to Python. I have been stumped on this part of the MATLAB code:
[L,N] = size(Y);
if (L<p)
error('Insufficient number of columns in y');
end
I understand that [L,N] = size(Y) returns the number of rows and columns when Y is a matrix. However I have limited experience with Python and thus cannot understand how to do the same with Python. This also is part of the reason I do not understand how the MATLAB logic with in the loop can be also fulfilled in Python.
Thank you in advance!
Also, in case the rest of the code is also needed. Here it is.
function [M,Up,my,sing_values] = mvsa(Y,p,varargin)
if (nargin-length(varargin)) ~= 2
error('Wrong number of required parameters');
end
% data set size
[L,N] = size(Y)
if (L<p)
error('Insufficient number of columns in y');
end

I am still unclear as to what p is from your post, however the excerpt below effectively performs the same task as your MATLAB code in Python. Using numpy, you can represent a matrix as an array of arrays and then call .shape to return the number of rows and columns, respectively.
import numpy as np
p = 2
Y = np.matrix([[1, 1, 1, 1],[2, 2, 2, 2],[3, 3, 3, 3]])
L, N = Y.shape
if L < p:
print('Insufficient number of columns in y')

Non-numpy
data = ([[1, 2], [3, 4], [5, 6]])
L, N = len(data), len(data[0])
p = 2
if L < p:
raise ValueError("Insufficient number of columns in y")

number_of_rows = Y.__len__()
number_of_cols = Y[0].__len__()

Dealing with multi-dimensional arrays when ndims not known in advance

I am working with data from netcdf files, with multi-dimensional variables, read into numpy arrays. I need to scan all values in all dimensions (axes in numpy) and alter some values. But, I don't know in advance the dimension of any given variable. At runtime I can, of course, get the ndims and shapes of the numpy array.
How can I program a loop thru all values without knowing the number of dimensions, or shapes in advance? If I knew a variable was exactly 2 dimensions, I would do
shp=myarray.shape
for i in range(shp[0]):
for j in range(shp[1]):
do_something(myarray[i][j])

You should look into ravel, nditer and ndindex.
# For the simple case
for value in np.nditer(a):
do_something_with(value)
# This is similar to above
for value in a.ravel():
do_something_with(value)
# Or if you need the index
for idx in np.ndindex(a.shape):
a[idx] = do_something_with(a[idx])
On an unrelated note, numpy arrays are indexed a[i, j] instead of a[i][j]. In python a[i, j] is equivalent to indexing with a tuple, ie a[(i, j)].

You can use the flat property of numpy arrays, which returns a generator on all values (no matter the shape).
For instance:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> for x in A.flat:
... print x
1
2
3
4
5
6
You can also set the values in the same order they're returned, e.g. like this:
>>> A.flat[:] = [x / 2 if x % 2 == 0 else x for x in A.flat]
>>> A
array([[1, 1, 3],
[2, 5, 3]])
I am not sure the order in which flat returns the elements is guaranteed in any way (as it iterates through the elements as they are in memory, so depending on your array convention you are likely to have it always being the same, unless you are really doing it on purpose, but be careful...)
And this will work for any dimension.
** -- Edit -- **
To clarify what I meant by 'order not guaranteed', the order of elements returned by flat does not change, but I think it would be unwise to count on it for things like row1 = A.flat[:N], although it will work most of the time.

This might be the easiest with recursion:
a = numpy.array(range(30)).reshape(5, 3, 2)
def recursive_do_something(array):
if len(array.shape) == 1:
for obj in array:
do_something(obj)
else:
for subarray in array:
recursive_do_something(subarray)
recursive_do_something(a)
In case you want the indices:
a = numpy.array(range(30)).reshape(5, 3, 2)
def do_something(x, indices):
print(indices, x)
def recursive_do_something(array, indices=None):
indices = indices or []
if len(array.shape) == 1:
for obj in array:
do_something(obj, indices)
else:
for i, subarray in enumerate(array):
recursive_do_something(subarray, indices + [i])
recursive_do_something(a)

Look into Python's itertools module.
Python 2: http://docs.python.org/2/library/itertools.html#itertools.product
Python 3: http://docs.python.org/3.3/library/itertools.html#itertools.product
This will allow you to do something along the lines of
for lengths in product(shp[0], shp[1], ...):
do_something(myarray[lengths[0]][lengths[1]]

Rotating one-dimensional array of n elements left by m positions using constant memory?

Given a one-dimensional array of n elements, and a how would you efficiently rotate the array so that elements of the array to the left by m positions? Is it possible to do this in O(n) time complexity using only constant O(1) memory?
For example if n=8 and your array is [0, 1, 2, 3, 4, 5, 6, 7] and you rotate it to the left by m=2, you get [2, 3, 4, 5, 6, 7, 0, 1].
Here is the naive solution in Python I implemented which uses O(n) time and O(n) memory with a temporary array.
def rotateLeft(A, m):
temp = [None]*len(A)
for i in xrange(len(temp)):
temp[i] = A[(i + m) % len(A)]
for i in xrange(len(A)):
A[i] = temp[i]
How could I do this more efficiently? I was told this could be done with a constant amount of memory and still in O(n) time.
Solutions in any language are okay and any suggestions are more than welcome.
EDIT: I am not looking for library solutions. Additionally, the array is not a linked list/deque. There is no notion of head/tail/next/previous elements.

Let's look at the reversed final array:
[1, 0, 7, 6, 5, 4, 3, 2] (spacing mine)
Do you see something interesting?

Try to think about what a solution like this would looks like. If you can't use more space, the only available move is to swap the elements. Try to do it with a pen, with a 2 elements array, then a 3. After you get the idea it should be quite easy.
In fact, using swap needs one more variable, you can fix this using the XOR swap algorithm ( http://en.wikipedia.org/wiki/XOR_swap_algorithm ), but I don't think this is really important.

This isn't trivial due to the memory constraint. Start by moving the first element to it's new place, and since you can't store too many element - continue with finding a place for the element you just evicted.
Now think how the number of passes is related to GCD(n,m), and how your algorithm should reflect that - start by a common case where the gcd is 1 (for e.g. if m=3 in your example above) - once the chain of replacements will be over (you can check by comparing the current index with the one you started with), you'll have finished the task. However, for a GCD(m,n) > 1, you would have moved only part of the elements, and you'll need to start a new chain with the element right after the last one you started with.
Now convince yourself that the overall number of shift done is O(n), regardless of the number of phases.

Look at the following PseudoCode. Seems correct to me.
function rotate(a[], a.length)
{
i = 0;
j = 0;
k = 0;
temp = a[0];
k = (i + a.length - 2 ) % a.length
while (j < a.length){
temp1 = a[k]
a[k] = temp;
i = k;
temp = temp1
k = (i + a.length - 2 ) % a.length
j++
}
}

This is one constant memory solution which was hinted to me by #MBo. It uses 3 extra variables in addition to the array space and m: i, midpoint, and endpoint. It loops 3 times, first reversing two subarrays and then reversing the entire array in the final loop. It is O(n/2 + m/2 + (n-m) /2) which is just O(n) since 0 <= m < n due to the fact that I do m = m % n at the beginning to reduce any given positive m to an index within the array's range.
Swapping also sends the values through a 2-item buffer (a tuple of 2 elements) but it's still constant memory for swaps so it doesn't matter.
def rotateLeft(A, m):
m %= len(A)
# Reverse A[0..m-1]
midpoint = m/2
endpoint = m-1
for i in xrange(midpoint):
A[i], A[endpoint-i] = A[endpoint-i], A[i]
# Reverse A[m..n-1]
midpoint = m+(len(A)-m)/2
endpoint = len(A)-1
for i in xrange(m, midpoint):
A[i], A[endpoint-(i-m)] = A[endpoint-(i-m)], A[i]
# Reverses all elements of array in place
midpoint = len(A)/2
endpoint = len(A)-1
for i in xrange(midpoint):
A[i], A[endpoint-i] = A[endpoint-i], A[i]
It also allows negative rotations (rotations to the right) which is really neat in my opinion. This means that rotateRight can be implemented in the following way.
def rotateRight(A, m):
rotateLeft(A, -m)
And then the following code will then pass the assertion check just fine.
A = [0, 1, 2, 3, 4, 5, 6]
B = A[:] # Make copy of A and assign it to B
rotateLeft(A, 4)
rotateRight(A, 4)
assert(A == B)

how to do scatter/gather operations in numpy

lets say i have arrays:
a = array((1,2,3,4,5))
indices = array((1,1,1,1))
and i perform operation:
a[indices] += 1
the result is
array([1, 3, 3, 4, 5])
in other words, the duplicates in indices are ignored
if I wanted the duplicates not to be ignored, resulting in:
array([1, 6, 3, 4, 5])
how would I go about this?
the example above is somewhat trivial, what follows is exactly what I am trying to do:
def inflate(self,pressure):
faceforces = pressure * cross(self.verts[self.faces[:,1]]-self.verts[self.faces[:,0]], self.verts[self.faces[:,2]]-self.verts[self.faces[:,0]])
self.verts[self.faces[:,0]] += faceforces
self.verts[self.faces[:,1]] += faceforces
self.verts[self.faces[:,2]] += faceforces
def constrain_lengths(self):
vectors = self.verts[self.constraints[:,1]] - self.verts[self.constraints[:,0]]
lengths = sqrt(sum(square(vectors), axis=1))
correction = 0.5 * (vectors.T * (1 - (self.restlengths / lengths))).T
self.verts[self.constraints[:,0]] += correction
self.verts[self.constraints[:,1]] -= correction
def compute_normals(self):
facenormals = cross(self.verts[self.faces[:,1]]-self.verts[self.faces[:,0]], self.verts[self.faces[:,2]]-self.verts[self.faces[:,0]])
self.normals.fill(0)
self.normals[self.faces[:,0]] += facenormals
self.normals[self.faces[:,1]] += facenormals
self.normals[self.faces[:,2]] += facenormals
lengths = sqrt(sum(square(self.normals), axis=1))
self.normals = (self.normals.T / lengths).T
Ive been getting some very buggy results as a result of duplicates being ignored in my indexed assignment operations.

numpy's histogram function is a scatter operation.
a += histogram(indices, bins=a.size, range=(0, a.size))[0]
You may need to take some care because if indices contains integers, small rounding errors could result in values ending up in the wrong bucket. In which case use:
a += histogram(indices, bins=a.size, range=(-0.5, a.size-0.5))[0]
to get each index into the centre of each bin.
Update: this works. But I recommend using #Eelco Hoogendoorn's answer based on numpy.add.at.

Slightly late to the party, but seeing how commonly this operation is required, and the fact that it still does not seem to be a part of standard numpy, ill put my solution here for reference:
def scatter(rowidx, vals, target):
"""compute target[rowidx] += vals, allowing for repeated values in rowidx"""
rowidx = np.ravel(rowidx)
vals = np.ravel(vals)
cols = len(vals)
data = np.ones(cols)
colidx = np.arange(cols)
rows = len(target)
from scipy.sparse import coo_matrix
M = coo_matrix((data,(rowidx,colidx)), shape=(rows, cols))
target += M*vals
def gather(idx, vals):
"""for symmetry with scatter"""
return vals[idx]
A custom C routine in numpy could easily be twice as fast still, eliminating the superfluous allocation of and multiplication with ones, for starters, but it makes a world of difference in performance versus a loop in python.
Aside from performance considerations, it is stylistically much more in line with other numpy-vectorized code to use a scatter operation, rather than mash some for loops in your code.
Edit:
Ok, forget about the above. As of the lastest 1.8 release, doing scatter operations is now directly supported in numpy at optimal efficiency.
def scatter(idx, vals, target):
"""target[idx] += vals, but allowing for repeats in idx"""
np.add.at(target, idx.ravel(), vals.ravel())

I don't know of a way to do it that is any faster than:
for face in self.faces[:,0]:
self.verts[face] += faceforces
You could also make self.faces into an array of 3 dictionaries where the keys correspond to the face and the value to the number of times it needs to be added. You'd then get code like:
for face in self.faces[0]:
self.verts[face] += self.faces[0][face]*faceforces
which might be faster. I do hope that someone comes up with a better way because I wanted to do this when trying to help someone speed-up their code earlier today.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find indexes of subarray in numpy array - python

Related

Split a list into n randomly sized chunks

Matlab to Python Matrix Code

Dealing with multi-dimensional arrays when ndims not known in advance

Rotating one-dimensional array of n elements left by m positions using constant memory?

how to do scatter/gather operations in numpy

Categories

Resources