n*n matrix of 0s and 1s in python [duplicate] - python

This question already has answers here:
How to make a checkerboard in numpy?
(28 answers)
Closed 4 years ago.
How to create an ‘n*n’ checkerboard matrix with the values alternate 0 and 1, using the tile function.
For example:
when n has a value of 2, Output should be:
[[0 1]
[1 0]]
I am able to create a matrix with 0 and 1, but they are not appearing alternatively, below is what i tried:
import numpy as np
n = 4
arr = ([0,1])
print(np.tile(arr,(n,n//2)))
output I got:
[[0 1 0 1]
[0 1 0 1]
[0 1 0 1]
[0 1 0 1]]`
output I want:
[[0 1 0 1]
[1 0 1 0]
[0 1 0 1]
[1 0 1 0]]`

A simple way using numpy could be to define a vector of 0s and 1s of size n and take advantage of broadcasting to create a nxn checkerboard:
def checkerboard(n):
a = np.resize([0,1], n)
return np.abs(a-np.array([a]).T)
Sample use -
checkerboard(2)
array([[0, 1],
[1, 0]])
checkerboard(4)
array([[0, 1, 0, 1],
[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 0, 1, 0]])
Details -
The above works by initially creating a length n 1D vector of 0s and 1s using np.resize:
import numpy as np
n = 3
np.resize([0,1], n)
array([0, 1, 0])
And then subtracting its transposed (2D), which will result in a broadcast array of shape (n,n), with negative and positive 1s:
a-np.array([a]).T
array([[ 0, 1, 0, 1],
[-1, 0, -1, 0],
[ 0, 1, 0, 1],
[-1, 0, -1, 0]])
We just need to take the absolute value of it and we have a checkerboard matrix.

You could use numpy fancy indexing, no need to use np.tile:
import numpy as np
def tiling(n):
result = np.zeros((n, n))
result[::2, 1::2] = 1
result[1::2, ::2] = 1
return result
print(tiling(2))
print()
print(tiling(4))
Output
[[0. 1.]
[1. 0.]]
[[0. 1. 0. 1.]
[1. 0. 1. 0.]
[0. 1. 0. 1.]
[1. 0. 1. 0.]]

Here is a one line numpy solution. That said, I think Daniel's response is much more readable and probably more efficient.
If n is odd then np.arange(n*n).reshape(n,n)%2 gives the correct result. However, if n is even, then all the rows and columns will be the same (like your result). We can fix this by subtracting one from every other row.
tile = (np.arange(n*n).reshape(n,n)-np.arange(n).reshape(n,1)*(n%2+1))%2
Equivalently,
tile = (np.arange(n*n).reshape(n,n,order='F')-np.arange(n)*(n+1))%2

Related

Constructing a confusion matrix from data without sklearn

I am trying to construct a confusion matrix without using the sklearn library. I am having trouble correctly forming the confusion matrix. Here's my code:
def comp_confmat():
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
cm = []
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
for c1 in range(1,classes+1):#for every true class
counts = []
for c2 in range(1,classes+1):#for every predicted class
count = 0
for p in range(len(currentDataClass)):
if currentDataClass[p] == predictedClass[p]:
count += 1
counts.append(count)
cm.append(counts)
print(np.reshape(cm,(classes,classes)))
However this returns:
[[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]
[7 7 7 7 7]]
But I don't understand why each iteration results in 7 when I am reseting the count each time and it's looping through different values?
This is what I should be getting (using the sklearn's confusion_matrix function):
[[3 0 0 0 1]
[2 1 0 1 0]
[0 1 3 0 0]
[0 1 0 0 0]
[0 1 1 0 0]]
You can derive the confusion matrix by counting the number of instances in each combination of actual and predicted classes as follows:
import numpy as np
def comp_confmat(actual, predicted):
# extract the different classes
classes = np.unique(actual)
# initialize the confusion matrix
confmat = np.zeros((len(classes), len(classes)))
# loop across the different combinations of actual / predicted classes
for i in range(len(classes)):
for j in range(len(classes)):
# count the number of instances in each combination of actual / predicted classes
confmat[i, j] = np.sum((actual == classes[i]) & (predicted == classes[j]))
return confmat
# sample data
actual = [1, 3, 3, 2, 5, 5, 3, 2, 1, 4, 3, 2, 1, 1, 2]
predicted = [1, 2, 3, 4, 2, 3, 3, 2, 1, 2, 3, 1, 5, 1, 1]
# confusion matrix
print(comp_confmat(actual, predicted))
# [[3. 0. 0. 0. 1.]
# [2. 1. 0. 1. 0.]
# [0. 1. 3. 0. 0.]
# [0. 1. 0. 0. 0.]
# [0. 1. 1. 0. 0.]]
In your innermost loop, there should be a case distinction: Currently this loop counts agreement, but you only want that if actually c1 == c2.
Here's another way, using nested list comprehensions:
currentDataClass = [1,3,3,2,5,5,3,2,1,4,3,2,1,1,2]
predictedClass = [1,2,3,4,2,3,3,2,1,2,3,1,5,1,1]
classes = int(max(currentDataClass) - min(currentDataClass)) + 1 #find number of classes
counts = [[sum([(currentDataClass[i] == true_class) and (predictedClass[i] == pred_class)
for i in range(len(currentDataClass))])
for pred_class in range(1, classes + 1)]
for true_class in range(1, classes + 1)]
counts
[[3, 0, 0, 0, 1],
[2, 1, 0, 1, 0],
[0, 1, 3, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 1, 0, 0]]
Here is my solution using numpy and pandas:
import numpy as np
import pandas as pd
true_classes = [1, 3, 3, 2, 5, 5, 3, 2, 1, 4, 3, 2, 1, 1, 2]
predicted_classes = [1, 2, 3, 4, 2, 3, 3, 2, 1, 2, 3, 1, 5, 1, 1]
classes = set(true_classes)
number_of_classes = len(classes)
conf_matrix = pd.DataFrame(
np.zeros((number_of_classes, number_of_classes),dtype=int),
index=classes,
columns=classes)
for true_label, prediction in zip(true_classes ,predicted_classes):
# Each pair of (true_label, prediction) is a position in the confusion matrix (row, column)
# Basically here we are counting how many times we have each pair.
# The counting will be placed at the matrix index (true_label/row, prediction/column)
conf_matrix.loc[true_label, prediction] += 1
print(conf_matrix.values)
[[3 0 0 0 1]
[2 1 0 1 0]
[0 1 3 0 0]
[0 1 0 0 0]
[0 1 1 0 0]]

Slice and fill 2d array with 1d array of column indices [duplicate]

This question already has an answer here:
Slicing 2d array from values in 1d array onwards
(1 answer)
Closed 3 years ago.
Firstly, here's the 1-d analog of what I'm trying to do..
Suppose I have a 1d array of 0s and I want to replace every 0 from index 2 onward with a 1. I can do this as follows:
import numpy as np
x = np.array([0,0,0,0])
i = 2
x[i:] = 1
print(x) # [0 0 1 1]
Now, I'm trying to figure out the 2d version of this operation. To start, I have a 5x4 array of 0s like
foo = np.zeros(shape = (5,4))
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
and a corresponding 5 element array of column indices like
fill_locs = np.array([0, 3, 1, 1, 2])
For each row of foo, I want to fill columns i: with 1s where i is the index given by fill_locs. foo[fill_locs.reshape(-1, 1):] = 1 feels right, but doesn't work.
My desired output should look like
expected_result = np.array([
[1, 1, 1, 1],
[0, 0, 0, 1],
[0, 1, 1, 1],
[0, 1, 1, 1],
[0, 0, 1, 1],
])
You don't need slicing, and you don't need to create the original array. You can accomplish all of this with broadcasted comparison.
a = np.array([0, 3, 1, 1, 2])
n = 4
(a[:, None] <= np.arange(n)).view('i1')
array([[1, 1, 1, 1],
[0, 0, 0, 1],
[0, 1, 1, 1],
[0, 1, 1, 1],
[0, 0, 1, 1]], dtype=int8)
Or using less_equal.outer
np.less_equal.outer(a, np.arange(n)).view('i1')

Numpy broadcast array to smaller array with exact position for every row

Consider example matrix array:
[[0 1 2 1 0]
[1 1 2 1 0]
[0 1 0 0 0]
[1 2 1 0 0]
[1 2 2 3 2]]
What I need to do:
find maxima in every row
select smaller surrounding of the maxima from every row (3 values in this case)
paste the surrounding of the maxima into new array (narrower)
For the example above, the result is:
[[ 1. 2. 1.]
[ 1. 2. 1.]
[ 0. 1. 0.]
[ 1. 2. 1.]
[ 2. 3. 2.]]
My current working code:
import numpy as np
A = np.array([
[0, 1, 2, 1, 0],
[1, 1, 2, 1, 0],
[0, 1, 0, 0, 0],
[1, 2, 1, 0, 0],
[1, 2, 2, 3, 2],
])
b = A.argmax(axis=1)
C = np.zeros((len(A), 3))
for idx, loc, row in zip(range(len(A)), b, A):
print(idx, loc, row)
C[idx] = row[loc-1:loc+2]
print(C)
My question:
How to get rid of the for loop and replace it with some cheaper numpy operation?
Note:
This algorithm is for straightening broken "lines" in video stream frames with thousands of rows.
Approach #1
We can have a vectorized solution based on setting up sliding windows and then indexing into those with b-offsetted indices to get desired output. We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
The implementation would be -
from skimage.util.shape import view_as_windows
L = 3 # window length
w = view_as_windows(A,(1,L))[...,0,:]
Cout = w[np.arange(len(b)),b-L//2]
Being a view-based method, this has the advantage of being memory-efficient and hence good on performance too.
Approach #2
Alternatively, a one-liner by creating all those indices with outer-addition would be -
A[np.arange(len(b))[:,None],b[:,None] + np.arange(-(L//2),L//2+1)]
This works by making and array with all the desired indices, but somehow using that directly on A results in a 3D array, hence the subsequent indexing... Probably not optimal, but definitely another way of doing it!
import numpy as np
A = np.array([
[0, 1, 2, 1, 0],
[1, 1, 2, 1, 0],
[0, 1, 0, 0, 0],
[1, 2, 1, 0, 0],
[1, 2, 2, 3, 2],
])
b = A.argmax(axis = 1).reshape(-1, 1)
index = b + np.arange(-1,2,1).reshape(1, -1)
A[:,index][np.arange(b.size),np.arange(b.size)]

Faster way to "distribute" values of an ndarray into other ndarrays based on assignments?

Generally, I'm trying to split a distance matrix into K folds. Specifically, for the 3 x 3 case, my distance matrix might look like this:
full = np.array([
[0, 0, 3],
[1, 0, 1],
[2, 1, 0]
])
I also have a list of randomly generated assignments, the length of which is equal to the sum over all elements in the distance matrix. For the K = 3 case, it might look like this:
assignments = np.array([0, 1, 0, 2, 1, 1, 0, 0])
I want to create K = 3 new 3 x 3 matrices of zeros, in which the values of the distance matrix are "distributed" according to the assignments list. Code is more precise than words, so here's my current attempt:
def assign(full, assignments):
folds = [np.zeros(full.shape) for _ in xrange(np.max(assignments) + 1)]
rows, cols = full.shape
a = 0
for r in xrange(rows):
for c in xrange(cols):
for i in xrange(full[r, c]):
folds[assignments[a]][r, c] += 1
a += 1
return folds
This works (slowly), and in this example,
folds = assign(full, assignments)
for f in folds:
print f
returns
[[ 0. 0. 2.]
[ 0. 0. 0.]
[ 1. 1. 0.]]
[[ 0. 0. 1.]
[ 0. 0. 1.]
[ 1. 0. 0.]]
[[ 0. 0. 0.]
[ 1. 0. 0.]
[ 0. 0. 0.]]
as desired. However, my current attempt is very slow, especially for the N x N case for N large. How can I improve the speed of this function? Is there some numpy magic that I should be using here?
One idea I had was converting to a sparse matrix and looping over nonzero entries. This would only help a bit, however,
You can use add.at to do unbuffered in place operation:
import numpy as np
full = np.array([
[0, 0, 3],
[1, 0, 1],
[2, 1, 0]
])
assignments = np.array([0, 1, 0, 2, 1, 1, 0, 0])
res = np.zeros((np.max(assignments) + 1,) + full.shape, dtype=int)
r, c = np.nonzero(full)
n = full[r, c]
r = np.repeat(r, n)
c = np.repeat(c, n)
np.add.at(res, (assignments, r, c), 1)
print(res)
You just need to figure out what item in the flattened output would get incremented each time, then aggregate them with bincount:
def assign(full, assignments):
assert len(assignments) == np.sum(full)
rows, cols = full.shape
n = np.max(assignments) + 1
full_flat = full.reshape(-1)
full_flat_non_zero = full_flat != 0
full_flat_indices = np.repeat(np.where(full_flat_non_zero)[0],
full_flat[full_flat_non_zero])
folds_flat_indices = full_flat_indices + assignments*rows*cols
return np.bincount(folds_flat_indices,
minlength=n*rows*cols).reshape(n, rows, cols)
>>> assign(full, assignments)
array([[[0, 0, 2],
[0, 0, 0],
[1, 1, 0]],
[[0, 0, 1],
[0, 0, 1],
[1, 0, 0]],
[[0, 0, 0],
[1, 0, 0],
[0, 0, 0]]])
You may want to print out each of those intermediate arrays for your example, to see what exactly is going on.

numpy/scipy build adjacency matrix from weighted edgelist

I'm reading a weighted egdelist / numpy array like:
0 1 1
0 2 1
1 2 1
1 0 1
2 1 4
where the columns are 'User1','User2','Weight'. I'd like to perform a DFS algorithm with scipy.sparse.csgraph.depth_first_tree, which requires a N x N matrix as input. How can I convert the previous list into a square matrix as:
0 1 1
1 0 1
0 4 0
within numpy or scipy?
Thanks for your help.
EDIT:
I've been working with a huge (150 million nodes) network, so I'm looking for a memory efficient way to do that.
You could use a memory-efficient scipy.sparse matrix:
import numpy as np
import scipy.sparse as sparse
arr = np.array([[0, 1, 1],
[0, 2, 1],
[1, 2, 1],
[1, 0, 1],
[2, 1, 4]])
shape = tuple(arr.max(axis=0)[:2]+1)
coo = sparse.coo_matrix((arr[:, 2], (arr[:, 0], arr[:, 1])), shape=shape,
dtype=arr.dtype)
print(repr(coo))
# <3x3 sparse matrix of type '<type 'numpy.int64'>'
# with 5 stored elements in COOrdinate format>
To convert the sparse matrix to a dense numpy array, you could use todense:
print(coo.todense())
# [[0 1 1]
# [1 0 1]
# [0 4 0]]
Try something like the following:
import numpy as np
import scipy.sparse as sps
A = np.array([[0, 1, 1],[0, 2, 1],[1, 2, 1],[1, 0, 1],[2, 1, 4]])
i, j, weight = A[:,0], A[:,1], A[:,2]
# find the dimension of the square matrix
dim = max(len(set(i)), len(set(j)))
B = sps.lil_matrix((dim, dim))
for i,j,w in zip(i,j,weight):
B[i,j] = w
print B.todense()
>>>
[[ 0. 1. 1.]
[ 1. 0. 1.]
[ 0. 4. 0.]]

Categories

Resources