I'm working on a way to find the lowest 1-Norm of a given Matrix using a permutation of its rows. The problem is that the permutation can't be fully random. There are 4 subsets of rows in the Matrix having a special parameter. I want to permute just the rows having this one parameter and keeping those on the same spot.
Ex. The first column defines the type of row.
A = [
1, val_11, val_12, ... #1. Row
2, val_21, val_22, ... #2. Row
2, val_31, val_32, ... #3. Row
2, val_41, val_42, ... #4. Row
1, val_51, val_52, ... #5. Row
]
So in this example I want to permute the 1. and 5. Row AND permute the 2., 3. and 4. Row keeping the Types like [1;2;2;2;1] in place.
You just have to carefully define your permutations. Fancy indexing will then do the job :
Example :
from numpy.random import randint
M0 = randint(10,size=(5,5))
after=[4,2,3,1,0]
M0 = M[after]
print(M0)
print(M)
[[4 9 3 0 0]
[3 1 7 6 0]
[6 6 5 0 9]
[0 4 7 1 3]
[0 0 1 0 6]]
[[0 0 1 0 6]
[6 6 5 0 9]
[0 4 7 1 3]
[3 1 7 6 0]
[4 9 3 0 0]]
Related
I have problem with execution of np.argpartition
I have nd.array
example = np.array([[5,6,7,3,4],[1,2,3,7,5],[6,7,4,2,3],[1,2,3,5,9],[2,3,6,1,2,]])
out: [[5 6 7 3 4]
[1 2 3 7 5]
[6 7 4 2 3]
[1 2 3 5 9]
[2 3 6 1 2]]
I can get indices for sorted array by np.argsort
print(np.argsort(example))
out:
[[3 4 0 1 2]
[0 1 2 4 3]
[3 4 2 0 1]
[0 1 2 3 4]
[3 0 4 1 2]]
I want to use np.argsort to economy some time for executing, because I need only 3 sorted element in each row of this array. I use this code to do it:
print(np.argpartition(example, 3, axis=1))
out: [[3 4 0 1 2]
[1 0 2 4 3]
[3 4 2 0 1]
[1 0 2 3 4]
[3 4 0 1 2]]
I expect that the first three indices of each row will match the indices in the sorted array, but this is not the caseŃ That doesn't work . I don't understand what I did wrong.
np.argpartition(example, k, axis=1) does not return sorted array for first k elements. It only returns indices such that only (k+1)th element is sorted. If you see in your output, only the 4th element matches with argsort()
If you want first three sorted elements, you have to give a list for k parameter
index_array = np.argpartition(example, [0,1,2], axis=1)
print(np.take_along_axis(example,index_array, axis=1)) ##this will give you first 3 sorted elements
I'm trying to change values in matrix a with given index matrix d and matrix e.
And the matrix should always be symmetrical.
What I come up with is to overwrite the primal matrix with given index, and try to make it symmetrical, then go for another overwrite, until all the given index matrix have been gone through. It's not efficient.
But I'm stuck with how make it symmetrical.
For example:
a = np.ones([4,4],dtype=np.object) #the primal matrix
d = np.array([[1],
[2],
[0],
[0]]) #the first index matrix
a[np.arange(a.shape[0])[:,None],d] =2 #the element change to 2 with the indexes shown in d matrix
Now the result is:
a = np.array([[1 2 1 1]
[1 1 2 1]
[2 1 1 1]
[2 1 1 1]])
After making it symmetrical (if a[ i ][ j ] was selected in d matrix, a[ j ][ i ] should also be changed to 2, how to do this part).
The expected output should be :
a = np.array([[1 2 2 2]
[2 1 2 1]
[2 2 1 1]
[2 1 1 1]])
Then, for another overwrite again:
e = np.array([[0],[2],[1],[1]])
a[np.arange(a.shape[0])[:,None],e] =3
Now the result is:
a = np.array([[3 2 2 2]
[2 1 3 1]
[2 3 1 1]
[2 3 1 1]])
Make it symmetrical, (I don't know how to do this part) the final output should be : (overwrite the values if they were given 2 or 1 before)
a = np.array([[3 2 2 2]
[2 1 3 3]
[2 3 1 1]
[2 3 1 1]])
What should I do to get symmetrical matrix?
And, is there anyway to change the primal matrix a directly to get the final result? In a more efficient way?
Thanks in advance !!
You can simply switch the first and second indices and apply the change, the result would be symmetrical:
a[np.arange(a.shape[0])[:,None], d] = 2
a[d, np.arange(a.shape[0])[:,None]] = 2
output:
[[1 2 2 2]
[2 1 2 1]
[2 2 1 1]
[2 1 1 1]]
Same with any number of other changes:
a[np.arange(a.shape[0])[:,None], e] = 3
a[e, np.arange(a.shape[0])[:,None]] = 3
output:
[[3 2 2 2]
[2 1 3 3]
[2 3 1 1]
[2 3 1 1]]
I have a 560x560 numpy matrix, which I want to convert to a 28x28 one.
Therefore, I want to subdivide it into regions with size 16x16, calculate the mean of each such regions and put that value in a new matrix.
Now I have:
import numpy as np
oldMat = ... #I load the 560x560 matrix
newMat = np.zeros((28,28)) #Initializes the new matrix of size 28x28
for i in range(0,560, 16):
for j in range(0,560, 16): #Loops over the top left corner of each region
sum = 0
for di in range(16):
for dj in range(16): #Loops over the indices of the elements in each region
sum += oldMat[i+di, j+dj]
mean = sum/256 #Calculates the mean of the elements of each region
newMat[i][j] = mean
Is there a faster way to do this? (I'm sure there is.)
If you simply want to reshape your matrix from 2D --> 4D, then you can use np.reshape():
import numpy as np
np.random.seed(0)
data = np.random.randint(0,5,size=(6,6))
Yields:
[[4 0 3 3 3 1]
[3 2 4 0 0 4]
[2 1 0 1 1 0]
[1 4 3 0 3 0]
[2 3 0 1 3 3]
[3 0 1 1 1 0]]
Then reshape:
data.reshape((3,3,2,2))
Returns:
[[[[4 0]
[3 3]]
[[3 1]
[3 2]]
[[4 0]
[0 4]]]
[[[2 1]
[0 1]]
[[1 0]
[1 4]]
[[3 0]
[3 0]]]
Sample Data:
id cluster
1 3
2 3
3 3
4 3
5 1
6 1
7 2
8 2
9 2
10 4
11 4
12 5
13 6
What I would like to do is replace the largest cluster id with 0 and the second largest with 1 and so on and so forth. Output would be as shown below.
id cluster
1 0
2 0
3 0
4 0
5 2
6 2
7 1
8 1
9 1
10 3
11 3
12 4
13 5
I'm not quite sure where to start with this. Any help would be much appreciated.
The objective is to relabel groups defined in the 'cluster' column by the corresponding rank of that group's total value count within the column. We'll break this down into several steps:
Integer factorization. Find an integer representation where each unique value in the column gets its own integer. We'll start with zero.
We then need the counts of each of these unique values.
We need to rank the unique values by their counts.
We assign the ranks back to the positions of the original column.
Approach 1
Using Numpy's numpy.unique + argsort
TL;DR
u, i, c = np.unique(
df.cluster.values,
return_inverse=True,
return_counts=True
)
(-c).argsort()[i]
Turns out, numpy.unique performs the task of integer factorization and counting values in one go. In the process, we get unique values as well, but we don't really need those. Also, the integer factorization isn't obvious. That's because per the numpy.unique function, the return value we're looking for is called the inverse. It's called the inverse because it was intended to act as a way to get back the original array given the array of unique values. So if we let
u, i, c = np.unique(
df.cluster.values,
return_inverse=True,
return_couns=True
)
You'll see i looks like:
array([2, 2, 2, 2, 0, 0, 1, 1, 1, 3, 3, 4, 5])
And if we did u[i] we get back the original df.cluster.values
array([3, 3, 3, 3, 1, 1, 2, 2, 2, 4, 4, 5, 6])
But we are going to use it as integer factorization.
Next, we need the counts c
array([2, 3, 4, 2, 1, 1])
I'm going to propose the use of argsort but it's confusing. So I'll try to show it:
np.row_stack([c, (-c).argsort()])
array([[2, 3, 4, 2, 1, 1],
[2, 1, 0, 3, 4, 5]])
What argsort does in general is to place the top spot (position 0), the position to draw from in the originating array.
# position 2
# is best
# |
# v
# array([[2, 3, 4, 2, 1, 1],
# [2, 1, 0, 3, 4, 5]])
# ^
# |
# top spot
# from
# position 2
# position 1
# goes to
# pen-ultimate spot
# |
# v
# array([[2, 3, 4, 2, 1, 1],
# [2, 1, 0, 3, 4, 5]])
# ^
# |
# pen-ultimate spot
# from
# position 1
What this allows us to do is to slice this argsort result with our integer factorization to arrive at a remapping of the ranks.
# i is
# [2 2 2 2 0 0 1 1 1 3 3 4 5]
# (-c).argsort() is
# [2 1 0 3 4 5]
# argsort
# slice
# \ / This is our integer factorization
# a i
# [[0 2] <-- 0 is second position in argsort
# [0 2] <-- 0 is second position in argsort
# [0 2] <-- 0 is second position in argsort
# [0 2] <-- 0 is second position in argsort
# [2 0] <-- 2 is zeroth position in argsort
# [2 0] <-- 2 is zeroth position in argsort
# [1 1] <-- 1 is first position in argsort
# [1 1] <-- 1 is first position in argsort
# [1 1] <-- 1 is first position in argsort
# [3 3] <-- 3 is third position in argsort
# [3 3] <-- 3 is third position in argsort
# [4 4] <-- 4 is fourth position in argsort
# [5 5]] <-- 5 is fifth position in argsort
We can then drop it into the column with pd.DataFrame.assign
u, i, c = np.unique(
df.cluster.values,
return_inverse=True,
return_counts=True
)
df.assign(cluster=(-c).argsort()[i])
id cluster
0 1 0
1 2 0
2 3 0
3 4 0
4 5 2
5 6 2
6 7 1
7 8 1
8 9 1
9 10 3
10 11 3
11 12 4
12 13 5
Approach 2
I'm going to leverage the same concepts. However, I'll use Pandas pandas.factorize to get integer factorization with numpy.bincount to count values. The reason to use this approach is because Numpy's unique actually sorts the values in the midst of factorizing and counting. pandas.factorize does not. For larger data sets, big oh is our friend as this remains O(n) while the Numpy approach is O(nlogn).
i, u = pd.factorize(df.cluster.values)
c = np.bincount(i)
df.assign(cluster=(-c).argsort()[i])
id cluster
0 1 0
1 2 0
2 3 0
3 4 0
4 5 2
5 6 2
6 7 1
7 8 1
8 9 1
9 10 3
10 11 3
11 12 4
12 13 5
You can use groupby, transform, and rank:
df['cluster'] = df.groupby('cluster').transform('count')\
.rank(ascending=False, method='dense')\
.sub(1).astype(int)
Output:
id cluster
0 1 0
1 2 0
2 3 0
3 4 0
4 5 2
5 6 2
6 7 1
7 8 1
8 9 1
9 10 3
By using category and value_counts
df.cluster.map((-df.cluster.value_counts()).astype('category').cat.codes
)
Out[151]:
0 0
1 0
2 0
3 0
4 2
5 2
6 1
7 1
8 1
9 3
Name: cluster, dtype: int8
This isn't the cleanest solution but it does work. Feel free to suggest improvements:
valueCounts = df.groupby('cluster')['cluster'].count()
valueCounts_sorted = df.sort_values(ascending=False)
for i in valueCounts_sorted.index.values:
print (i)
temp = df[df.cluster == i]
temp["random"] = count
idx = temp.index.values
df.loc[idx, "cluster"] = temp.random.values
count += 1
I have a 2D coefficient array COEFF with size row x col and a position array POS with size n x 2.
The goal is to create a batched array BAT with size n x (2*l) x (2*l) where l is the half length of subarray.
It looks like this
BAT[i, :, :] = COEFF[POS[i, 1] - l:POS[i, 1] + l, POS[i, 0] - l:POS[i, 0] + l]
It is possible to generate BAT based on above sequential code. However, I'm wondering is there an efficient way to construct the BAT array in parallel.
Thanks!
I'm not aware of a perfectly satisfactory solution to mixing advanced indexing and slicing in that way. But the following may be acceptable (assuming that by "parallel" you mean "vectorised"):
import numpy as np
nrow, ncol = 7, 7
n, l = 3, 2
coeff = np.random.randint(0,10, (nrow,ncol))
pos = np.c_[np.random.randint(l, nrow-l+1, (n,)),np.random.randint(l, ncol-l+1, (n,))]
i = (pos[:, :1] + np.arange(-l, l))[:, :, None]
j = (pos[:, 1:] + np.arange(-l, l))[:, None, :]
print(coeff, '\n')
print(pos, '\n')
print(coeff[i, j])
Prints:
# [[7 6 7 6 3 9 9]
# [3 6 8 3 4 8 6]
# [3 7 4 7 4 6 8]
# [0 7 2 3 7 0 4]
# [8 5 2 0 0 1 7]
# [4 6 1 9 4 5 4]
# [1 6 8 3 4 5 0]]
# [[2 2]
# [3 2]
# [2 4]]
# [[[7 6 7 6]
# [3 6 8 3]
# [3 7 4 7]
# [0 7 2 3]]
# [[3 6 8 3]
# [3 7 4 7]
# [0 7 2 3]
# [8 5 2 0]]
# [[7 6 3 9]
# [8 3 4 8]
# [4 7 4 6]
# [2 3 7 0]]]