A toy-case for my problem:
I have a numpy array of size, say, 1000:
import numpy as np
a = np.arange(1000)
I also have a "projection array" p which is a mapping from a to another array b:
p = np.random.randint(0,1000,(1000,1000))
It is easy to get b from a using "fancy indexing":
b = a[p]
But b is not a view, as noted by several previous questions/answers and the numpy documentation.
Unfortunately, in my case only the values in a change over the course of a long simulation and using fancy indexing at each iteration to obtain b becomes very costly. I only read from b and do not modify it.
I understand it is not possible (yet) to solve this with fancy indexing.
I was wondering if anyone had a similar problem/bottleneck and came up with some other workaround?
What your asking for isn't practical and that's why the numpy folks haven't implemented it. You could do it yourself with something like:
class FancyView(object):
def __init__(self, array, index):
self._array = array
self._index = index.copy()
def __array__(self):
return self._array[self._index]
def __getitem__(self, index):
return self._array[self._index[index]]
b = FancyView(a, p)
But notice that the expensive a[p] operation will get called every time you use b as an array. There is no other practice way of making a 'view' of this kind. Numpy can get away with using views for basic slicing because it can manipulate the strides, but there is no way to do something like this using strides.
If you only need parts of b you might be able to get some time savings by indexing the fancy view instead of using it as an array.
Related
I am trying to figure out the best way to do leave one out indexing with numpy, this is the desired behaviour:
import numpy as np
a = np.random.randint(0,10,size=10)
print(a)
def fun(x, xs):
print(x,xs) #do some stuff
for i in range(a.shape[0]):
fun(a[i], a[np.arange(a.shape[0]) != i]) #this is all I can think of, but its horrid!
is there a nicer, more efficient way to do this?
EDIT: To clarify, a question that is hopefully a bit clearer:
I have an array and I want a view that has 1 or more elements missing in the middle e.g. a = [1,2,3,4,5,...] to a = [1,2,4,5,...]. According to here fancy indexing / masking makes a copy of the array, I want to avoid this, and avoid creating a large index array. Thanks in advance for the help!
I've been experimenting with Numba lately, and here's something that I still cannot understand:
In a normal Python function with NumPy arrays you can do something like this:
# Subtracts two NumPy arrays and returns an array as the result
def sub(a, b):
res = a - b
return res
But, when you use Numba's #guvectorize decorator like so:
# Subtracts two NumPy arrays and returns an array as the result
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subT(a, b, res):
res = a - b
The result is not even correct. Worse still, there are instances where it complains about "Invalid usage of [math operator] with [parameters]"
I am baffled. Even if I try this:
# Subtracts two NumPy arrays and returns an array as the result
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subTt(a, b, res):
res = np.subtract(a,b)
The result is still incorrect. Considering that this is supposed to be a supported Math operation, I don't see why it doesn't work.
I know the standard way is like this:
# Subtracts two NumPy arrays and returns an array as the result
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subTtt(a, b, res):
for i in range(a.shape[0]):
res[i] = a[i] - b[i]
and this does work as per expected.
But what is wrong with my way?
P/S This is just a trivial example to explain my problem, I don't actually plan to use #guvectorize just to subtract arrays :P
P/P/S I suspect it has something to do with how the arrays are copied to gpu memory, but I am not sure...
P/P/P/S This looked relevant but the function here operates only on a single thread right...
The correct way to write this is:
#guvectorize(['void(float32[:], float32[:], float32[:])'],'(n),(n)->(n)')
def subT(a, b, res):
res[:] = a - b
The reason what you tried didn't work is a limitation of python syntax not particular to numba.
name = expr rebinds the value of name to expr, it can never mutate the original value of name, as you could with, e.g. c++ references.
name[] = expr calls (in essence), name.__setitem__ which can be used to modify name, as numpy arrays do, the empty slice [:] refers to the whole array.
I want to generate symmetric zero diagonal matrices. My symmetric part work, but when I use fill_diagonal from numpy as the result I got "None". My code is below. Thank you for reading
import numpy as np
matrix_size = int(input("Size of the matrix \n"))
random_matrix = np.random.random_integers(-4,4,size=(matrix_size,matrix_size))
symmetric_matrix = (random_matrix + random_matrix.T)/2
print(symmetric_matrix)
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
print(zero_diogonal_matrix)
np.fill_diagonal(), like many other methods across python/numpy, works in-place. For example: Why does “return list.sort()” return None, not the list?. That is that it directly alters the object in memory and does not create a new object. The return value from such functions is None. Therefore, change:
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
To just:
np.fill_diagonal(symmetric_matrix,0)
You will then see the change reflected in symmetric_matrix.
It's probably overkill, but in case you want to preserve the tenet of minimising surprise, you could wrap this (and other functions like it) in a function that takes care of preserving the original array:
def fill_diagonal(source_array, diagonal):
copy = source_array.copy()
np.fill_diagonal(copy, diagonal)
return copy
But the question then becomes "who exactly is going to be least surprised by doing it this way?"
I am trying to solve a "very simple" problem. Not so simple in Python. Given a large matrix A and another smaller matrix B I want to substitute certain elements of A with B.
In Matlab is would look like this:
Given A, row_coord = [1,5,6] col_coord = [2,4], and a matrix B of size(3X2), A[row_coord, col_coord] = B
In Python I tried to use product(row_coord, col_coord) from the itertools to generate the set of all indexes that need to be accessible in A but it does not work. All examples on submatrix substitution refer to block-wise row_coord = col_coord examples. Nothing concrete except for the http://comments.gmane.org/gmane.comp.python.numeric.general/11912 seems to relate to the problem that I am facing and the code in the link does not work.
Note: I know that I can implement what I need via the double for-loop, but on my data such a loop adds 9 secs to the run of one iteration and I am looking for a faster way to implement this.
Any help will be greatly appreciated.
Assuming you're using numpy arrays then (in the case where your B is a scalar) the following code should work to assign the chosen elements to the value of B.
itertools.product will create all of the coordinate pairs which we then convert into a numpy array and use in indexing your original array:
import numpy as np
from itertools import product
A = np.zeros([20,20])
col_coord = [0,1,3]
row_coord = [1,2]
coords = np.array(list(product(row_coord, col_coord)))
B = 1
A[coords[:,0], coords[:,1]] = B
I used this excellent answer by unutbu to work out how to do the indexing.
I wonder if anyone has an elegant solution to being able to pass a python list, a numpy vector (shape(n,)) or a numpy vector (shape(n,1)) to a function. The idea would be to generalize a function such that any of the three would be valid without adding complexity.
Initial thoughts:
1) Use a type checking decorator function and cast to a standard representation.
2) Add type checking logic inline (significantly less ideal than #1).
3) ?
I do not generally use python builtin array types, but suspect a solution to this question would also support those.
I think the simplest thing to do is to start off your function with numpy.atleast_2d. Then, all 3 of your possibilities will be converted to the x.shape == (n, 1) case, and you can use that to simplify your function.
For example,
def sum(x):
x = np.atleast_2d(x)
return np.dot(x, np.ones((x.shape[0], 1)))
atleast_2d returns a view on that array, so there won't be much overhead if you pass in something that's already an ndarray. However, if you plan to modify x and therefore want to make a copy instead, you can do x = np.atleast_2d(np.array(x)).
You can convert the three types to a "canonical" type, which is a 1dim array, using:
arr = np.asarray(arr).ravel()
Put in a decorator:
import numpy as np
import functools
def takes_1dim_array(func):
#functools.wraps(func)
def f(arr, *a, **kw):
arr = np.asarray(arr).ravel()
return func(arr, *a, **kw)
return f
Then:
#takes_1dim_arr
def func(arr):
print arr.shape