Related
In Python, if I define a 2D array, and set the second row as np.nan, the second row will become all -9223372036854775808 rather than missing values. An example is here:
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
[[ 0 0 0
0 0 0
0 0 0
5]
[-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808]
[ 0 0 0
3 6 6
6 6 6
6]
[ 0 0 3
4 6 6
6 6 6
6]
[ 0 1 2
4 4 4
4 4 4
4]]
Does anyone have any idea? And how should I correctly assign one row to np.nan?
For your reference, I am running these codes on python 3.7.10 environment created by mamba on Ubuntu 16.04.7 LTS (GNU/Linux 4.15.0-132-generic x86_64).
np.nan is a special floating point value that cannot be used in integer arrays. Since b is an array of integers, the code b[1, :] = np.nan attempts to convert np.nan to an integer, which is an undefined behavior. See this for a discussion of a similar issue.
You initialised your array with integers. Integers do not have a possible "nan" value and will resort to the minimal value. A quick fix is to initialize your array as np.floats, they are allowed to be "nan":
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=np.float)
b[1, :] = np.nan
print(b)
First of all nan is a special value for float arrays only.
I tried running your code on my python 3.8(64 bit environment) on Windows x-64 based.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
This is what I got
[[ 0 0 0 0 0 0
0 0 0 5]
[-2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648
-2147483648 -2147483648 -2147483648 -2147483648]
[ 0 0 0 3 6 6
6 6 6 6]
[ 0 0 3 4 6 6
6 6 6 6]
[ 0 1 2 4 4 4
4 4 4 4]]
In case of int array I got lower bound of int in place of NaN and you are also getting the same depending on your environment.
So instead of int array you can use float array.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=float)
I wonder if there is a way to perform the MultiLabelBinarizer in sklearn with a specific dimension. For example we have the code as below:
from sklearn.preprocessing import MultiLabelBinarizer
y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
MultiLabelBinarizer().fit_transform(y)
We will get the dimension 5 as the existed numbers are 0,1..,4
array([[0, 0, 1, 1, 1],
[0, 0, 1, 0, 0],
[1, 1, 0, 1, 0],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 0]])
My question is that How can we get the specific number of dimensions for this array for example dimension 6 so the answer should provide:
array([[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 0, 0],
[1, 1, 0, 1, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0]])
Is there a way to do this in sklearn or other methods or module in python that can handle this kind of desired result easily or we can just create this kind of array by our own algorithm?
Any ideas for this will be much appreciated. Thanks.
MultiLabelBinarizer accepts a parameter classes where you can indicate the ordering of the classes to be found. Providing a class that is not in the original array will add an extra dimension of 0 entries:
from sklearn.preprocessing import MultiLabelBinarizer
y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
MultiLabelBinarizer(classes=[0, 1, 2, 3, 4, 5]).fit_transform(y)
# output
>>>[[0 0 1 1 1 0]
[0 0 1 0 0 0]
[1 1 0 1 0 0]
[1 1 1 1 1 0]
[1 1 1 0 0 0]]
Note that since the parameter is actually meant to indicate the ordering of the classes, the sequence you provide is important. Further, when providing too few classes the unknown classes will be ignored and not appear in the transformed array.
I would like to multyply the following matrices (using numpy) in the most efficient way.
This is the code for the matrixes:
a = np.array([[1, 5], [2, 6], [3, 7], [4, 8]])
m = np.array([[1, 0, 0, 1], [1, 0, 1, 0], [0, 1, 0, 1], [0, 1, 1, 1]])
These are the matrixes visualized better:
a:
[[1 5]
[2 6]
[3 7]
[4 8]]
m:
[[1 0 0 1]
[1 0 1 0]
[0 1 0 1]
[0 1 1 1]]
I want to multiply a by (the first column of matrix m), like this
a m[:,0] x0
[[1 5] [[1] [[1 5]
[2 6] * [1] = [2 6]
[3 7] [0] [0 0]
[4 8]] [0]] [0 0]
And then I want to multiply a by (the second column of matrix m), like this
a * m[:,1] = x1
And then 3rd and 4th column
a * m[:,2] = x2
a * m[:,3] = x3
And finally, I want to put the resulting matrices x0,x1,x2,x3 all in one matrix.
X = [x0, x1, x2, x3]
The size X in this example is 4 x 8.
The final result in this example is:
X =
[[[1 5 0 0 0 0 1 5]
[2 6 0 0 2 6 0 0]
[0 0 3 7 0 0 3 7]
[0 0 4 8 4 8 4 8]]
I would like to know how to do this with build-in functions of numpy, and using generators, instead of using 2 for loops, if it is possible.
This is just an example. In reality the matrixes have large dimensions and it is important that the multiplications are done as fast as possible.
Thank you
You may achieve it with broadcast and reshape
arr_out = (a[:,None] * m[...,None]).reshape(4,8)
Out[176]:
array([[1, 5, 0, 0, 0, 0, 1, 5],
[2, 6, 0, 0, 2, 6, 0, 0],
[0, 0, 3, 7, 0, 0, 3, 7],
[0, 0, 4, 8, 4, 8, 4, 8]])
You could transpose and expand the dimensions of m, to get the wanted result:
m.T[...,None] * a
array([[[1, 5],
[2, 6],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[3, 7],
[4, 8]],
...
If you want to stack the arrays horizontally to end up with a 2D array, use np.hstack:
np.hstack(m.T[...,None] * a)
array([[1, 5, 0, 0, 0, 0, 1, 5],
[2, 6, 0, 0, 2, 6, 0, 0],
[0, 0, 3, 7, 0, 0, 3, 7],
[0, 0, 4, 8, 4, 8, 4, 8]])
Or reshaping as:
(a[:,None] * m[...,None]).reshape(m.shape[0], -1)
This is the answer that I was looking for. Thank you Yatu and hpaulj.
X = m.T[...,None] * a
for i in range(4):
reshaped = np.hstack(X[i,:,:])
reshaped_simpler = np.hstack(X)
print (reshaped_simpler)
I got the rest of the answer from the following link:
reshape numpy 3D array to 2D
I rearranged the for loop because I got a Warning regarding the generators going to be depricated in future versions of Numpy.
The goal is to create all paths where each node is visited a single time. A 1 represents a viable route and a 0 represents that no routes exist.
Example:
Say I have a matrix
1 2 3 4 5 6 7
1 [0 1 0 1 0 0 1]
2 [1 0 1 0 0 1 0]
3 [0 1 0 0 1 0 0]
4 [1 0 0 0 0 0 0]
5 [0 0 1 0 0 0 0]
6 [0 1 0 0 0 0 1]
7 [1 0 0 0 0 1 0]
For the sake of this example lets start from 4
Then all the possible paths would be: (sort of like the travelling salesman problem)
[4,1,2,3,5]
[4,1,2,6,7]
[4,1,7,6,2,3,5]
I am storing the matrix as a 2 dimensional array:
M0 =[[0, 1, 0, 1, 0, 0, 1],
[1, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 1, 0]]
Here is some of the stuff I was trying out which doesn't work because it does not reset back to the most previous co ordinate where it branched at.
I also think this might not be the best way, I was trying to implement some variant of a greedy method. There must be a better way than trying to fix this code.
path=[]
allPaths=[]
visited=[]
branchAt=[]
def route():
tempMatrix = genMat() #set this to the above example matrix if you trying to run this code
column = 3 #Starting with a static column for now
#while(column+1 not in path):
for counter in range(0,20): #Static loop as well
path.append(column+1)
delMat(tempMatrix[column]) #Sets all the elements to 0 in that row of the tempMatrix so same path isn't visited twice in subsequent interations (ie when in another column)
oneCount=0 #check if path can branch out in the current column (aka more than one 1's)
for row in range(0,len(matrix)):
if(tempMatrix[row][column]==1):
oneCount+=1
for row in range(0,len(matrix)):
coOrds=([column+1,row+1])
if(tempMatrix[row][column]==1):
#if (coOrds) not in visited and oneCount>1:
if oneCount>1 and coOrds in visited:
continue
else:
if(oneCount>1):
visited.append(coOrds)
if len(branchAt)<1:
branchAt.append(coOrds)
column=row
delMat(tempMatrix[row])
break
# else:
# if(oneCount>1):
# continue
# else:
# continue
else:
if(row==len(matrix)-1):
print("Final Path: ",path)
allPaths.append(tuple(path))
bran.clear()
break
print("allPaths: ", allPaths)
print("Visited: ", visited)
Use algorithm from https://www.geeksforgeeks.org/find-paths-given-source-destination/
Modified printing of paths to use your node numbering (i.e. starting at 1 rather than 0)
from collections import defaultdict
#This class represents a directed graph
# using adjacency list representation
class Graph:
def __init__(self,vertices):
#No. of vertices
self.V= vertices
# default dictionary to store graph
self.graph = defaultdict(list)
# function to add an edge to graph
def addEdge(self,u,v):
self.graph[u].append(v)
'''A recursive function to print all paths from 'u' to 'd'.
visited[] keeps track of vertices in current path.
path[] stores actual vertices and path_index is current
index in path[]'''
def printAllPathsUtil(self, u, d, visited, path):
# Mark the current node as visited and store in path
visited[u]= True
path.append(u+1) # add 1 so print output starts at 1
# If current vertex is same as destination, then print
# current path[]
if u ==d:
print (path)
else:
# If current vertex is not destination
#Recur for all the vertices adjacent to this vertex
for i in self.graph[u]:
if visited[i]==False:
self.printAllPathsUtil(i, d, visited, path)
# Remove current vertex from path[] and mark it as unvisited
path.pop()
visited[u]= False
# Prints all paths from 's' to 'd'
def printAllPaths(self,s, d):
# Mark all the vertices as not visited
visited =[False]*(self.V)
# Create an array to store paths
path = []
# Call the recursive helper function to print all paths
self.printAllPathsUtil(s, d,visited, path)
def generate_paths(A):
g = Graph(len(A))
# We loop over all row, column combinations and add edge
# if there is a connection between the two nodes
for row in range(len(A)):
for column in range(len(A[0])):
if A[row][column] == 1:
g.addEdge(row, column)
for row in range(len(A)):
# show row+1, so row numbering prints starting with 1
print (f"Following are all different paths starting at {row+1}")
for column in range(row+1, len(A[0])):
g.printAllPaths(row, column)
A =[[0, 1, 0, 1, 0, 0 , 1],
[1, 0, 1, 0, 0 , 1, 0],
[0, 1, 0, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 1, 0]]
generate_paths(A)
Output
Following are all different paths starting at 1
[1, 2]
[1, 7, 6, 2]
[1, 2, 3]
[1, 7, 6, 2, 3]
[1, 4]
[1, 2, 3, 5]
[1, 7, 6, 2, 3, 5]
[1, 2, 6]
[1, 7, 6]
[1, 2, 6, 7]
[1, 7]
Following are all different paths starting at 2
[2, 3]
[2, 1, 4]
[2, 6, 7, 1, 4]
[2, 3, 5]
[2, 1, 7, 6]
[2, 6]
[2, 1, 7]
[2, 6, 7]
Following are all different paths starting at 3
[3, 2, 1, 4]
[3, 2, 6, 7, 1, 4]
[3, 5]
[3, 2, 1, 7, 6]
[3, 2, 6]
[3, 2, 1, 7]
[3, 2, 6, 7]
Following are all different paths starting at 4
[4, 1, 2, 3, 5]
[4, 1, 7, 6, 2, 3, 5]
[4, 1, 2, 6]
[4, 1, 7, 6]
[4, 1, 2, 6, 7]
[4, 1, 7]
Following are all different paths starting at 5
[5, 3, 2, 1, 7, 6]
[5, 3, 2, 6]
[5, 3, 2, 1, 7]
[5, 3, 2, 6, 7]
Following are all different paths starting at 6
[6, 2, 1, 7]
[6, 7]
Following are all different paths starting at 7
I created a matrix using:
Matrix = [[0 for x in range(5)] for z in range(5)]
I am trying to extract the elements above the diagnal and store it in an array.
For example:
[0, 0, 0, 1, 1]
[1, 0, 0, 0, 0]
[1, 1, 0, 0, 1]
[0, 1, 1, 0, 0]
[0, 1, 0, 1, 0]
U=[0,0,1,1,0,0,0,0,1,0]
A=[1,4,9]
[0,1,2]
[2,3,6]
U=[4,9,2]
You can just use List Comprehensions.
from random import randrange
Matrix = [[randrange(10) for x in range(5)] for z in range(5)]
>>>Matrix
[[6, 3, 7, 9, 3], [8, 6, 4, 0, 4], [0, 0, 1, 3, 2], [7, 7, 2, 3, 7], [3, 3, 5, 6, 3]]
[Matrix[i][j] for i in range(0,5) for j in range(i+1,5)]
[3, 7, 9, 3, 4, 0, 4, 3, 2, 7]
So here is a Solution, i changed your matrix to generate any random numbers so that you can see better which numbers are taken into account. TRIU = Triangle Upper is the function that takes a Matrix in your given format and takes the Upper Triangle, which is above the diagonal.
#import numpy as np
from random import randrange
Matrix = [[randrange(10) for x in range(5)] for z in range(5)]
def triu(matrix):
length = len(matrix[0])
U = list()
diagLine = 0
for row in Matrix:
length -= 1
colCounter = 0
for col in row:
if colCounter > diagLine:
U.append(col)
colCounter += 1
diagLine += 1
return U
#print np.matrix(Matrix)
print triu(Matrix)
Result:
[[0 0 2 4 0]
[6 4 8 9 0]
[6 2 2 3 0]
[2 9 6 5 5]
[1 5 8 9 2]]
[0, 2, 4, 0, 8, 9, 0, 3, 0, 5]
[Finished in 0.2s]