I have a 560x560 numpy matrix, which I want to convert to a 28x28 one.
Therefore, I want to subdivide it into regions with size 16x16, calculate the mean of each such regions and put that value in a new matrix.
Now I have:
import numpy as np
oldMat = ... #I load the 560x560 matrix
newMat = np.zeros((28,28)) #Initializes the new matrix of size 28x28
for i in range(0,560, 16):
for j in range(0,560, 16): #Loops over the top left corner of each region
sum = 0
for di in range(16):
for dj in range(16): #Loops over the indices of the elements in each region
sum += oldMat[i+di, j+dj]
mean = sum/256 #Calculates the mean of the elements of each region
newMat[i][j] = mean
Is there a faster way to do this? (I'm sure there is.)
If you simply want to reshape your matrix from 2D --> 4D, then you can use np.reshape():
import numpy as np
np.random.seed(0)
data = np.random.randint(0,5,size=(6,6))
Yields:
[[4 0 3 3 3 1]
[3 2 4 0 0 4]
[2 1 0 1 1 0]
[1 4 3 0 3 0]
[2 3 0 1 3 3]
[3 0 1 1 1 0]]
Then reshape:
data.reshape((3,3,2,2))
Returns:
[[[[4 0]
[3 3]]
[[3 1]
[3 2]]
[[4 0]
[0 4]]]
[[[2 1]
[0 1]]
[[1 0]
[1 4]]
[[3 0]
[3 0]]]
Related
I'm trying to change values in matrix a with given index matrix d and matrix e.
And the matrix should always be symmetrical.
What I come up with is to overwrite the primal matrix with given index, and try to make it symmetrical, then go for another overwrite, until all the given index matrix have been gone through. It's not efficient.
But I'm stuck with how make it symmetrical.
For example:
a = np.ones([4,4],dtype=np.object) #the primal matrix
d = np.array([[1],
[2],
[0],
[0]]) #the first index matrix
a[np.arange(a.shape[0])[:,None],d] =2 #the element change to 2 with the indexes shown in d matrix
Now the result is:
a = np.array([[1 2 1 1]
[1 1 2 1]
[2 1 1 1]
[2 1 1 1]])
After making it symmetrical (if a[ i ][ j ] was selected in d matrix, a[ j ][ i ] should also be changed to 2, how to do this part).
The expected output should be :
a = np.array([[1 2 2 2]
[2 1 2 1]
[2 2 1 1]
[2 1 1 1]])
Then, for another overwrite again:
e = np.array([[0],[2],[1],[1]])
a[np.arange(a.shape[0])[:,None],e] =3
Now the result is:
a = np.array([[3 2 2 2]
[2 1 3 1]
[2 3 1 1]
[2 3 1 1]])
Make it symmetrical, (I don't know how to do this part) the final output should be : (overwrite the values if they were given 2 or 1 before)
a = np.array([[3 2 2 2]
[2 1 3 3]
[2 3 1 1]
[2 3 1 1]])
What should I do to get symmetrical matrix?
And, is there anyway to change the primal matrix a directly to get the final result? In a more efficient way?
Thanks in advance !!
You can simply switch the first and second indices and apply the change, the result would be symmetrical:
a[np.arange(a.shape[0])[:,None], d] = 2
a[d, np.arange(a.shape[0])[:,None]] = 2
output:
[[1 2 2 2]
[2 1 2 1]
[2 2 1 1]
[2 1 1 1]]
Same with any number of other changes:
a[np.arange(a.shape[0])[:,None], e] = 3
a[e, np.arange(a.shape[0])[:,None]] = 3
output:
[[3 2 2 2]
[2 1 3 3]
[2 3 1 1]
[2 3 1 1]]
I'm working on a way to find the lowest 1-Norm of a given Matrix using a permutation of its rows. The problem is that the permutation can't be fully random. There are 4 subsets of rows in the Matrix having a special parameter. I want to permute just the rows having this one parameter and keeping those on the same spot.
Ex. The first column defines the type of row.
A = [
1, val_11, val_12, ... #1. Row
2, val_21, val_22, ... #2. Row
2, val_31, val_32, ... #3. Row
2, val_41, val_42, ... #4. Row
1, val_51, val_52, ... #5. Row
]
So in this example I want to permute the 1. and 5. Row AND permute the 2., 3. and 4. Row keeping the Types like [1;2;2;2;1] in place.
You just have to carefully define your permutations. Fancy indexing will then do the job :
Example :
from numpy.random import randint
M0 = randint(10,size=(5,5))
after=[4,2,3,1,0]
M0 = M[after]
print(M0)
print(M)
[[4 9 3 0 0]
[3 1 7 6 0]
[6 6 5 0 9]
[0 4 7 1 3]
[0 0 1 0 6]]
[[0 0 1 0 6]
[6 6 5 0 9]
[0 4 7 1 3]
[3 1 7 6 0]
[4 9 3 0 0]]
There are a couple of related questions to this, the most relevant is this question I think.
Let's say I have a dataset like this (highly simplified for demonstration purposes):
import numpy as np
import pandas as pd
from scipy.spatial import distance
from scipy.cluster import hierarchy
val = np.array([[0.20288834, 0.80406494, 4.59921579, 14.28184739],
[0.22477082, 1.43444223, 6.87992605, 12.90299896],
[0.22811485, 0.74509454, 3.85198421, 19.22564266],
[0.20374529, 0.73680174, 3.63178517, 17.82544951],
[0.22722696, 0.86113728, 3.00832186, 16.62306058],
[0.25577882, 0.85671779, 3.70655719, 17.49690061],
[0.23018219, 0.68039151, 2.50815837, 15.09039053],
[0.21638751, 1.12455083, 3.56246872, 18.82866991],
[0.26600895, 1.09415595, 2.85300018, 17.93139433],
[0.22369445, 0.73689845, 3.24919113, 18.60914745]])
df = pd.DataFrame(val, columns=["C{}".format(i) for i in range(val.shape[1])])
C0 C1 C2 C3
0 0.202888 0.804065 4.599216 14.281847
1 0.224771 1.434442 6.879926 12.902999
2 0.228115 0.745095 3.851984 19.225643
3 0.203745 0.736802 3.631785 17.825450
4 0.227227 0.861137 3.008322 16.623061
5 0.255779 0.856718 3.706557 17.496901
6 0.230182 0.680392 2.508158 15.090391
7 0.216388 1.124551 3.562469 18.828670
8 0.266009 1.094156 2.853000 17.931394
9 0.223694 0.736898 3.249191 18.609147
I want to cluster the columns of this dataframe and thereby also specify the number of clusters I obtain. Typically, this can be achieved by using the cut_tree function.
However, currently, cut_tree is broken and therefore I looked for alternatives which led me to the link at the beginning of this post where it is suggested to use fcluster as alternative.
Problem is that I don't see how to specify an exact number of clusters but only a maximum number using the maxclust argument.
So for my simple example from above I can do:
# number of target cluster
n_clusters = range(1, 5)
for n_clust in n_clusters:
Z = hierarchy.linkage(distance.pdist(df.T.values), method='average', metric='euclidean')
print("--------\nValues from flcuster:\n{}".format(hierarchy.fcluster(Z, n_clust, criterion='maxclust')))
print("\nValues from cut_tree:\n{}".format(hierarchy.cut_tree(Z, n_clust).T))
which prints
Values from flcuster:
[1 1 1 1]
Values from cut_tree:
[[0 0 0 0]]
--------
Values from flcuster:
[1 1 1 2]
Values from cut_tree:
[[0 0 0 1]]
--------
Values from flcuster:
[1 1 1 2]
Values from cut_tree:
[[0 0 1 2]]
--------
Values from flcuster:
[1 1 1 2]
Values from cut_tree:
[[0 1 2 3]]
As one can see, fcluster returns 2 distinct clusters at maximum while cut_tree returns the desired number.
Is there a way to get the same output for fcluster for the time until the bug in cut_tree is fixed? If not, is there any other good alternative for this in another package?
Not sure how to get the right number of clusters out of fcluster here.
As an alternative, scikit-learn has AgglomerativeClustering:
from sklearn.cluster import AgglomerativeClustering
# number of target cluster
n_clusters = range(1, 5)
for n_clust in n_clusters:
Z = hierarchy.linkage(distance.pdist(df.T.values), method='average', metric='euclidean')
print("--------\nValues from flcuster:\n{}".format(hierarchy.fcluster(Z, n_clust, criterion='maxclust')))
print("\nValues from cut_tree:\n{}".format(hierarchy.cut_tree(Z, n_clust).T))
print("\nValues from AgglomerativeClustering:\n{}".format(AgglomerativeClustering(n_clusters=n_clust, affinity='euclidean', linkage='average').fit(df.T.values).labels_))
which returns the right number of clusters for the provided dataset (although in different order):
Values from flcuster:
[1 1 1 1]
Values from cut_tree:
[[0 0 0 0]]
Values from AgglomerativeClustering:
[0 0 0 0]
--------
Values from flcuster:
[1 1 1 2]
Values from cut_tree:
[[0 0 0 1]]
Values from AgglomerativeClustering:
[0 0 0 1]
--------
Values from flcuster:
[1 1 1 2]
Values from cut_tree:
[[0 0 1 2]]
Values from AgglomerativeClustering:
[0 0 2 1]
--------
Values from flcuster:
[1 1 1 2]
Values from cut_tree:
[[0 1 2 3]]
Values from AgglomerativeClustering:
[3 1 2 0]
I have a problem with the instruction np.nonzero() in python. I want to take all the indices of a given list that are non zero. So, consider that I have the following code:
import numpy as np
from scipy.special import binom
M=4
N=3
def generate(N,nb):
states = np.zeros((int(binom(nb+N-1, nb)), N), dtype=int)
states[0, 0]=nb
ni = 0 # init
for i in xrange(1, states.shape[0]):
states[i,:N-1] = states[i-1, :N-1]
states[i,ni] -= 1
states[i,ni+1] += 1+states[i-1, N-1]
if ni >= N-2:
if np.any(states[i, :N-1]):
ni = np.nonzero(states[i, :N-1])[0][-1]
else:
ni += 1
return states
base = generate(M,N)
The result of base is given by:
base = [[3 0 0 0]
[2 1 0 0]
[2 0 1 0]
[2 0 0 1]
[1 2 0 0]
[1 1 1 0]
[1 1 0 1]
[1 0 2 0]
[1 0 1 1]
[1 0 0 2]
[0 3 0 0]
[0 2 1 0]
[0 2 0 1]
[0 1 2 0]
[0 1 1 1]
[0 1 0 2]
[0 0 3 0]
[0 0 2 1]
[0 0 1 2]
[0 0 0 3]]
The point is that for a given index j,k I want to take all the items in base that has non-zero components in the sites j,k, for example:
Taking j=0,k=1 I have to obtain:
result = [1 4 5 6]
which corresponds to the elements 1,4,5,6 of base that satisfies this condition. On the other hand, I have used the command:
np.nonzero((base[:, j]) & (base[:, k]))[0]
but it doesn't work correctly, any idea why?
First of all, the syntax for list index base[:, j] is wrong, use : [:][j] instead
also:
np.nonzero((base[:, j]) & (base[:, k]))[0]
won't work ,because the & sign is not applicable here..
you could use numpy like this:
b = np.array(base);
j=0;k=1;
np.nonzero(b.T[j]* b.T[k])[0]
which will give:
array([1, 4, 5, 6])
I have a 2-dimensional array of integers, we'll call it "A".
I want to create a 3-dimensional array "B" of all 1s and 0s such that:
for any fixed (i,j) sum(B[i,j,:])==A[i.j], that is, B[i,j,:] contains A[i,j] 1s in it
the 1s are randomly placed in the 3rd dimension.
I know how I would do this using standard python indexing but this turns out to be very slow.
I am looking for a way to do this that takes advantage of the features that can make Numpy fast.
Here is how I would do it using standard indexing:
B=np.zeros((X,Y,Z))
indexoptions=range(Z)
for i in xrange(Y):
for j in xrange(X):
replacedindices=np.random.choice(indexoptions,size=A[i,j],replace=False)
B[i,j,[replacedindices]]=1
Can someone please explain how I can do this in a faster way?
Edit: Here is an example "A":
A=np.array([[0,1,2,3,4],[0,1,2,3,4],[0,1,2,3,4],[0,1,2,3,4],[0,1,2,3,4]])
in this case X=Y=5 and Z>=5
Essentially the same idea as #JohnZwinck and #DSM, but with a shuffle function for shuffling a given axis:
import numpy as np
def shuffle(a, axis=-1):
"""
Shuffle `a` in-place along the given axis.
Apply numpy.random.shuffle to the given axis of `a`.
Each one-dimensional slice is shuffled independently.
"""
b = a.swapaxes(axis,-1)
# Shuffle `b` in-place along the last axis. `b` is a view of `a`,
# so `a` is shuffled in place, too.
shp = b.shape[:-1]
for ndx in np.ndindex(shp):
np.random.shuffle(b[ndx])
return
def random_bits(a, n):
b = (a[..., np.newaxis] > np.arange(n)).astype(int)
shuffle(b)
return b
if __name__ == "__main__":
np.random.seed(12345)
A = np.random.randint(0, 5, size=(3,4))
Z = 6
B = random_bits(A, Z)
print "A:"
print A
print "B:"
print B
Output:
A:
[[2 1 4 1]
[2 1 1 3]
[1 3 0 2]]
B:
[[[1 0 0 0 0 1]
[0 1 0 0 0 0]
[0 1 1 1 1 0]
[0 0 0 1 0 0]]
[[0 1 0 1 0 0]
[0 0 0 1 0 0]
[0 0 1 0 0 0]
[1 0 1 0 1 0]]
[[0 0 0 0 0 1]
[0 0 1 1 1 0]
[0 0 0 0 0 0]
[0 0 1 0 1 0]]]