I have 1-dimensional numpy array and want to store sparse updates of it.
Say I have array of length 500000 and want to do 100 updates of 100 elements. Updates are either adds or just changing the values (I do not think it matters).
What is the best way to do it using numpy?
I wanted to just store two arrays: indices, values_to_add and therefore have two objects: one stores dense matrix and other just keeps indices and values to add, and I can just do something like this with the dense matrix:
dense_matrix[indices] += values_to_add
And if I have multiple updates, I just concat them.
But this numpy syntax doesn't work fine with repeated elements: they are just ignored.
Updating pair when we have an update that repeats index is O(n). I thought of using dict instead of array to store updates, which looks fine from the point of view of complexity, but it doesn't look good numpy style.
What is the most expressive way to achieve this? I know about scipy sparse objects, but (1) I want pure numpy because (2) I want to understand the most efficient way to implement it.
If you have repeated indices you could use at, from the documentation:
Performs unbuffered in place operation on operand ‘a’ for elements
specified by ‘indices’. For addition ufunc, this method is equivalent
to a[indices] += b, except that results are accumulated for elements
that are indexed more than once.
Code
a = np.arange(10)
indices = [0, 2, 2]
np.add.at(a, indices, [-44, -55, -55])
print(a)
Output
[ -44 1 -108 3 4 5 6 7 8 9]
Related
Any efficient way to merge one tensor to another in Pytorch, but on specific indexes.
Here is my full problem.
I have a list of indexes of a tensor in below code xy is the original tensor.
I need to preserve the rows (those rows who are in indexes list) of xy and apply some function on elements other than those indexes (For simplicity let say the function is 'multiply them with two),
xy = torch.rand(100,4)
indexes=[1,2,55,44,66,99,3,65,47,88,99,0]
Then merge them back into the original tensor.
This is what I have done so far:
I create a mask tensor
indexes=[1,2,55,44,66,99,3,65,47,88,99,0]
xy = torch.rand(100,4)
mask=[]
for i in range(0,xy.shape[0]):
if i in indexes:
mask.append(False)
else:
mask.append(True)
print(mask)
import numpy as np
target_mask = torch.from_numpy(np.array(mask, dtype=bool))
print(target_mask.sum()) #output is 89 as these are element other than preserved.
Apply the function on masked rows
zy = xy[target_mask]
print(zy)
zy=zy*2
print(zy)
Code above is working fine and posted here to clarify the problem
Now I want to merge tensor zy into xy on specified index saved in the list indexes.
Here is the pseudocode I made, as one can see it is too complex and need 3 for loops to complete the task. and it will be too much resources wastage.
# pseudocode
for masked_row in indexes:
for xy_rows_index in xy:
if xy_rows_index= masked_row
pass
else:
take zy tensor row and replace here #another loop to read zy.
But I am not sure what is an efficient way to merge them, as I don't want to use NumPy or for loop etc. It will make the process slow, as the original tensor is too big and I am going to use GPU.
Any efficient way in Pytorch for this?
Once you have your mask you can assign updated values in place.
zy = 2 * xy[target_mask]
xy[target_mask] = zy
As for acquiring the mask I don't see a problem necessarily with your approach, though using the built-in set operations would probably be more efficient. This also gives an index tensor instead of a mask, which, depending on the number of indices being updated, may be more efficient.
i = list(set(range(len(xy)))-set(indexes))
zy = 2 * xy[i]
xy[i] = zy
Edit:
To address the comment, specifically to find the complement of indices of i we can do
i_complement = list(set(range(len(xy)))-set(i))
However, assuming indexes contains only values between 0 and len(xy)-1 then we could equivalently use i_complement = len(set(indexes)), which just removes the repeated values in indexes.
I am very new to Python, and I am trying to get used to performing Python's array operations rather than looping through arrays. Below is an example of the kind of looping operation I am doing, but am unable to work out a suitable pure array operation that does not rely on loops:
import numpy as np
def f(arg1, arg2):
# an arbitrary function
def myFunction(a1DNumpyArray):
A = a1DNumpyArray
# Create a square array with each dimension the size of the argument array.
B = np.zeros((A.size, A.size))
# Function f is a function of two elements of the 1D array. For each
# element, i, I want to perform the function on it and every element
# before it, and store the result in the square array, multiplied by
# the difference between the ith and (i-1)th element.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
# Sum through j and return full sums as 1D array.
return np.sum(B, axis=0)
In short, I am integrating a function which takes two elements of the same array as arguments, returning an array of results of the integral.
Is there a more compact way to do this, without using loops?
The use of an arbitrary f function, and this [i, :i] business complicates by passing a loop.
Most of the fast compiled numpy operations work on the whole array, or whole rows and/or columns, and effectively do so in parallel. Loops that are inherently sequential (value from one loop depends on the previous) don't fit well. And different size lists or arrays in each loop are also a good indicator that 'vectorizing' will be difficult.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
With a sample A and known f (as simple as arg1*arg2), I'd generate a B array, and look for patterns that treat B as a whole. At first glance it looks like your B is a lower triangle. There are functions to help index those. But that final sum might change the picture.
Sometimes I tackle these problems with a bottom up approach, trying to remove inner loops first. But in this case, I think some sort of big-picture approach is needed.
So i want to make a comparison between two matrices (size: 98000 x 64). The comparison should be done element by element and i want to the min value of each comparison stored in a third matrix with the same dimensions. I also want the comparison being done without the use of loops!
Here's a small example:
a=np.array([1,2,3])
b=np.array([4,1,2])
a function that compares the 1 and the 4, the 2 and the 1 and the 3 and the 2 and stores it in the vector c
answer
c=[1,1,2]
is there an efficient way to do this?
Numpy has a minimum feature, as below:
c = np.minimum(a,b)
I am working with multiple NumPy 2-dimension arrays (matrices), and I want to get some rows, or columns, from them (same rows or columns indexes for each of the 3 matrices, each time). I was wondering if I should use dictionary or not.
If I do it with a dictionary, then each row of each matrix would be indexed by a word, and would a list of values that interest me. E.g, myDict['word'] would contain [1 5 2 49 0 2].
If I do it with an array myArray, for each i I would have an array contained within myArray[i]. E.g, myArray[5] would contain array([[1 2 4 9 1 23]]).
On these I need to implement basic get operations (get rows or get columns), some matrix multiplications but never sorting or insertions.
I know I can do it both ways, my question is mainly of performance. Which do you think would be the faster and simplier?
Thanks a lot!
For matrix operation, I strongly recommend numpy, to justify my choice, I want first to quote wikipedia:
http://en.wikipedia.org/wiki/NumPy
"... any algorithm that can be expressed primarily as operations on arrays and matrices can run almost as quickly as the equivalent C code."
Besides that I notice that you want to have matrix multiplication functionality. Numpy provides you that, and of course in an efficient way.
I have two arrays in Python (numpy arrays):
a=array([5,7,3,5])
b=array([1,2,3,4])
and I wish to create a third array with each element from b appearing a times in the new array, as:
c=array([1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,4,4,4,4,4])
Is there a fast, numPythonic way of doing this with a minimum of looping? I need to use this operation thousands of times in a loop over a fairly large array, so I would like to have it be as fast as possible.
Cheers,
Mike
I believe repeat is what you want:
c = repeat(b, a)