I would like to make non simple operations on a 2D arrays using a sliding window in Python.
I will be more precise with an example. Suppose we have a 10x10 matrix and a sliding window of 3x3, starting from the very first element (1,1) i would like to create a new matrix of the same dimension where at each element i will have the result of the operation (percentile of the numbers, complex operations and so on) considering all the elements covered by the window. I can do this with the function np.lib.stride_tricks.as_strided, but for big arrays it gives memory error.
Does anyone know a better solution?
Do you mean to create a new matrix with the same values as the windows in order to alter its items without modifying the main matrix? If so, I think that you can use the copy method in order to avoid modifying the main matrix.
Numpy Copy method
Related
Let's say I have a matrix M of size 10x5, and a set of indices ix of size 4x3. I want to do tf.reduce_sum(tf.gather(M,ix),axis=1) which would give me a result of size 4x5. However, to do this, it creates an intermediate gather matrix of size 4x3x5. While at these small sizes this isn't a problem, if these sizes grow large enough, I get an OOM error. However, since I'm simply doing a sum over the 1st dimension, I never need to calculate the full matrix. So my question is, is there a way to calculate the end 4x5 matrix without going through the intermediate 4x3x5 matrix?
I think you can just multiply by sparse matrix -- I was searching if the two are internally equivalent then I landed on your post
i'm new to python so this is a two-different-sections question... first I don't understand what this code means and whats for the DESCR this supposed to be for description isn't ? and for the split part with values? i don't understand the values
datasets = [ds.DESCR.split()[0] for ds in datasets]
clf_name = [str(clf).split('(')[0][:12] for clf in clfs]
second when do i use np.ones or np.zeros i know to generate an array of ones or zeros but what i mean is is when specificly in data science does it require me to initialize an array with ones or zeros?
This code is creating two lists using list comprehension.
The ds.DESCR and other expressions here can mean anything, depending on the context.
As for your second sub-question, I'd advise to as something more specific.
If you need ones, you use np.ones, if you need zeros, you use np.zeros. That's it.
Np.zeros is great if you for example gradually update your matrix with values. Every entry that is not updated by your algorithm stays zero.
In application this could be a matrix that shows you edges in a picture. You create a matrix of the size of the picture filled with zeros and then go over the picture with an kernel that detects edges. For every edge you detect you increase the value in the matrix at the position of the detected edge.
A matrix or a vector of ones is great to do some matrix multiplications. Assume some vector of shape (n,1) x (1,n) of a Vector filled with ones will expand the vector to a matrix of shape (n,n). This is and similar cases can make a vector/matrix of ones necessary.
I have a matrix
x=np.mat('0.1019623; 0.1019623; 0.1019623')
and I want to find the exponential of every element and have it in a matrix of the same size. One way I found was by converting to array and proceed. However, this won't be a solution if we have, let's say, a 2x3 matrix. Is there a general solution?
The problem was with me using math.exp instead of np.exp.
I am working with a large matrix of size m * n for m,n>100000. Since my data is huge I want to store the matrix in memory and work with HDF5, and PyTables.
However, the elements of my matrix are small matrices of real values of dimension 5*5.
I have already looked at the following post, but I would like to know if there is any other way of storing this type of data in tables?
(Create a larger matrix from smaller matrices in numpy)
Thank you in advance
In numpy there are two relevant structures.
One is a 4dimensional array, e.g. np.zeros((100,100,5,5),int). The other is an 2 dimensional array of objects. np.zeros((100,100),dtype=object). With object array, the elements can be anythings - strings, numbers, lists, your 5x5 arrays, other 7x3 array, None, etc).
It is easiest to do math on the 4d array, for example taking the mean across all the 5x5 subarrays, or finding the [:,:,0,0] corner of all.
If your subarrays are all 5x5, it can be tricky to create and fill that object array. np.array(...) tries to create that 4dim array if possible.
With h5py you can chunk the file, and access portions of the larger array. But you still have to have a workable numpy representation to do anything with them.
I need to form a 2D matrix with total size 2,886 X 2,003,817. I try to use numpy.zeros to make a 2D zero element matrix and then calculate and assign each element of Matrix (most of them are zero son I need to replace few of them).
but when I try numpy.zero to initialize my matrix I get the following memory error:
C=numpy.zeros((2886,2003817)) "MemoryError"
I also try to form the matrix without initialization. Basically I calculate the element of each row in each iteration of my algorithm and then
C=numpy.concatenate((C,[A]),axis=0)
in which C is my final matrix and A is the calculated row at the current iteration. But I find out this method takes a lots of time, I am guessing it is because of using numpy.concatenate(?)
could you please let me know if there is a way to avoid memory error in initializing my matrix or is there any better method or suggestion to form the matrix in this size?
Thanks,
Amir
If your data has a lot of zeros in it, you should use scipy.sparse matrix.
It is a special data structure designed to save memory for matrices that have a lot of zeros. However, if your matrix is not that sparse, sparse matrices start to take up more memory. There are many kinds of sparse matrices, and each of them is efficient at one thing, while inefficient at another thing, so be careful with what you choose.