I did a few modifications to the Bellman-Ford to get some data I need to compare but I can't seem to be able to get some of the information I need from the return the PrintArr method, while it prints the 'dist_list' the return won't take it nor am I being able to create the object using list comprehension either.
Class Graph:
def __init__(self, vertices):
self.V = vertices
self.graph = []
def addEdge(self, u, v, w):
self.graph.append([u, v, w])
def printArr(self, dist, src):
#print("Source End Distance")
dist_list = []
for i in range(self.V):
#print("{0}\t{1}\t{2}".format(src, i, dist[i]))
#print(src, i, dist[i])
dist_list.append([src, i, dist[i]])
print(dist_list)
print(dist_list == [[src, i, dist[i]] for i in range(self.V)])
return [[src, i, dist[i]] for i in range(self.V)]
def BellmanFord(self, src):
dist = [float("Inf")] * self.V
dist[src] = 0
for _ in range(self.V - 1):
for u, v, w in self.graph:
if dist[u] != float("Inf") and dist[u] + w < dist[v]:
dist[v] = dist[u] + w
for u, v, w in self.graph:
if dist[u] != float("Inf") and dist[u] + w < dist[v]:
print("Graph contains negative weight cycle")
return
self.printArr(dist, src)
matrix = [[0, 2, 2, 2, -1], [9, 0, 2, 2, -1], [9, 3, 0, 2, -1], [9, 3, 2, 0, -1], [9, 3, 2, 2, 0]]
g = Graph(len(matrix))
[[g.addEdge(i, j, element) for j, element in enumerate(array) if i != j] for i, array in enumerate(matrix)]
print(g.BellmanFord(0))
Output:
[[0, 0, 0], [0, 1, 2], [0, 2, 1], [0, 3, 1], [0, 4, -1]]
True
None
Print: OK
List A = List B
Why return None??? What am I missing?
The None comes from:
print(g.BellmanFord(0))
BellmanFord never returns anything useful under any circumstances (it either falls off the end and implicitly returns None, or executes a plain return, which also return None). Remove the print() around the call, and you'll avoid the None output.
Alternatively, change self.printArr(dist, src) to return self.printArr(dist, src) so it does return something useful (assuming the early return, which should probably be raising an exception rather than silently returning None, isn't invoked).
Related
I have implemented the Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative Gain (NDCG) in python. I am not sure if the code is correct or did I forget some important criteria for DCG and NDCG. Here is my code so far:
import numpy as np
def get_dcg_score(predictions: np.ndarray, test_interaction_matrix: np.ndarray, topK = 10) -> float:
"""
predictions - np.ndarray - predictions of the recommendation algorithm for each user.
test_interaction_matrix - np.ndarray - test interaction matrix for each user.
returns - float - mean dcg score over all user.
"""
score = None
# TODO: YOUR IMPLEMENTATION.
score = []
for idx, (pred,test) in enumerate(zip(predictions,test_interaction_matrix)):
print(idx,pred,test)
for i, (j,jj) in enumerate(zip(pred[:topK], test[:topK])):
if i == 0 and jj == 1:
sc = jj
score.append(sc)
if i != 0 and jj == 1:
sc = jj / np.log2(j+2)
score.append(sc)
if (i != 0 and jj == 0) or ( i == 0 and jj == 0):
continue
score = sum(score)/len(predictions)
return score
I evaluate this on the two arrays.
predictions = np.array([[0, 1, 2, 3], [3, 2, 1, 0]])
test_interaction_matrix = np.array([[1, 0, 0, 0], [0, 0, 0, 1]])
dcg_score = get_dcg_score(predictions, test_interaction_matrix, topK=4)
print(dcg_score)
assert np.isclose(dcg_score, 1), "1 expected"
Now for the NDCG I need to implement Ideal Discounted Cumulative Gain (IDCG) first and then divide DCG by IDCG.
Here what I have for NDCG.
def get_ndcg_score(predictions: np.ndarray, test_interaction_matrix: np.ndarray, topK = 10) -> float:
"""
predictions - np.ndarray - predictions of the recommendation algorithm for each user.
test_interaction_matrix - np.ndarray - test interaction matrix for each user.
topK - int - topK recommendations should be evaluated.
returns - average ndcg score over all users.
"""
score = None
# TODO: YOUR IMPLEMENTATION.
score_idcg = []
for i, (vp, vt) in enumerate(zip(predictions,test_interaction_matrix)):
element_sorted = sorted(vp,reverse=True)
for j, (ele_p, ele_vt) in enumerate(zip(element_sorted, vt)):
if j == 0 and ele_vt == 1:
scr = ele_vt
score_idcg.append(scr)
if j != 0 and ele_vt == 1:
scr = ele_vt / np.log2(j+2)
score_idcg.append(scr)
if (j != 0 and ele_vt == 0) or (j == 0 and ele_vt == 0):
continue
print(score_idcg)
score_idcg = sum(score_idcg)/len(predictions)
print(score_idcg)
score_dcg = get_dcg_score(predictions, test_interaction_matrix, topK = 4)
score_ndcg = score_dcg / score_idcg
return score_ndcg
Again I test it on these two arrays:
predictions = np.array([[0, 1, 2, 3], [3, 2, 1, 0], [1, 2, 3, 0], [-1, -1, -1, -1]])
test_interaction_matrix = np.array([[1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 0], [0, 0, 0, 0]])
ndcg_score = get_ndcg_score(predictions, test_interaction_matrix, topK=4)
assert np.isclose(ndcg_score, 1), "ndcg score is not correct."
Could somebody please look at my code and find why I don't get the right result for ndcg test? I just can't figure it out. Please also look at dcg implementation as well if it is faulty. Sorry for the horrible code. Write me if you need more info. Any suggestion is appreciated.
I am trying to implement a version of the Edmonds–Karp algorithm for an undirected graph. The code below works, but it is very slow when working with big matrices.
Is it possible to get the Edmonds–Karp algorithm to run faster, or should I proceed to another algorithm, like "Push Relabel"? I have though of some kind of dequeue working with the bfs, but I don't know how to do that.
The code:
def bfs(C, F, s, t):
stack = [s]
paths={s:[]}
if s == t:
return paths[s]
while(stack):
u = stack.pop()
for v in range(len(C)):
if(C[u][v]-F[u][v]>0) and v not in paths:
paths[v] = paths[u]+[(u,v)]
if v == t:
return paths[v]
stack.append(v)
return None
def maxFlow(C, s, t):
n = len(C) # C is the capacity matrix
F = [[0] * n for i in range(n)]
path = bfs(C, F, s, t)
while path != None:
flow = min(C[u][v] - F[u][v] for u,v in path)
for u,v in path:
F[u][v] += flow
F[v][u] -= flow
path = bfs(C,F,s,t)
return sum(F[s][i] for i in range(n))
C = [[ 0, 3, 3, 0, 0, 0 ], # s
[ 3, 0, 2, 3, 0, 0 ], # o
[ 0, 2, 0, 0, 2, 0 ], # p
[ 0, 0, 0, 0, 4, 2 ], # q
[ 0, 0, 0, 2, 0, 2 ], # r
[ 0, 0, 0, 0, 2, 0 ]] # t
source = 0 # A
sink = 5 # F
maxVal = maxFlow(C, source, sink)
print("max_flow_value is: ", maxVal)
I think your solution can benefit from better graph representation. In particular try to keep a list of neighbours for the BFS. I actually wrote a quite long answer on the graph representation I use for flow algorithms here https://stackoverflow.com/a/23168107/812912
If your solution is still too slow I would recommend switching to Dinic's algorithm it has served me well in many tasks.
I have two sorted, numpy arrays similar to these ones:
x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])
Elements never repeat in the same array. I want to figure out a way of pythonicaly figuring out a list of indexes that contain the locations in the arrays at which the same element exists.
For instance, 1 exists in x and y at index 0. Element 2 in x doesn't exist in y, so I don't care about that item. However, 8 does exist in both arrays - in index 2 in x but index 1 in y. Similarly, 15 exists in both, in index 4 in x, but index 2 in y. So the outcome of my function would be a list that in this case returns [[0, 0], [2, 1], [4, 2]].
So far what I'm doing is:
def get_indexes(x, y):
indexes = []
for i in range(len(x)):
# Find index where item x[i] is in y:
j = np.where(x[i] == y)[0]
# If it exists, save it:
if len(j) != 0:
indexes.append([i, j[0]])
return indexes
But the problem is that arrays x and y are very large (millions of items), so it takes quite a while. Is there a better pythonic way of doing this?
Without Python loops
Code
def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)
# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)
# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]
Usage
x = np.array([1, 2, 8, 11, 15])
y = np.array([1, 8, 15, 17, 20, 21])
result = get_indexes_darrylg(x, y)
# result[0]: array([[0, 0],
[2, 1],
[4, 2]], dtype=int64)
Timing Posted Solutions
Results show that darrlg code has the fastest run time.
Code Adjustment
Each posted solution as a function.
Slight mod so that each solution outputs an numpy array.
Curve named after poster
Code
import numpy as np
import perfplot
def create_arr(n):
' Creates pair of 1d numpy arrays with half the elements equal '
max_val = 100000 # One more than largest value in output arrays
arr1 = np.random.randint(0, max_val, (n,))
arr2 = arr1.copy()
# Change half the elements in arr2
all_indexes = np.arange(0, n, dtype=int)
indexes = np.random.choice(all_indexes, size = n//2, replace = False) # locations to make changes
np.put(arr2, indexes, np.random.randint(0, max_val, (n//2, ))) # assign new random values at change locations
arr1 = np.sort(arr1)
arr2 = np.sort(arr2)
return (arr1, arr2)
def get_indexes_lllrnr101(x,y):
' lllrnr101 answer '
ans = []
i=0
j=0
while (i<len(x) and j<len(y)):
if x[i] == y[j]:
ans.append([i,j])
i += 1
j += 1
elif (x[i]<y[j]):
i += 1
else:
j += 1
return np.array(ans)
def get_indexes_joostblack(x, y):
'joostblack'
indexes = []
for idx,val in enumerate(x):
idy = np.searchsorted(y,val)
try:
if y[idy]==val:
indexes.append([idx,idy])
except IndexError:
continue # ignore index errors
return np.array(indexes)
def get_indexes_mustafa(x, y):
indices_in_x = np.flatnonzero(np.isin(x, y)) # array([0, 2, 4])
indices_in_y = np.flatnonzero(np.isin(y, x[indices_in_x])) # array([0, 1, 2]
return np.array(list(zip(indices_in_x, indices_in_y)))
def get_indexes_darrylg(x, y):
' darrylg answer '
# Use intersect to find common elements between two arrays
overlap = np.intersect1d(x, y)
# Indexes of common elements in each array
loc1 = np.searchsorted(x, overlap)
loc2 = np.searchsorted(y, overlap)
# Result is the zip two 1d numpy arrays into 2d array
return np.dstack((loc1, loc2))[0]
def get_indexes_akopcz(x, y):
' akopcz answer '
return np.array([
[i, j]
for i, nr in enumerate(x)
for j in np.where(nr == y)[0]
])
perfplot.show(
setup = create_arr, # tuple of two 1D random arrays
kernels=[
lambda a: get_indexes_lllrnr101(*a),
lambda a: get_indexes_joostblack(*a),
lambda a: get_indexes_mustafa(*a),
lambda a: get_indexes_darrylg(*a),
lambda a: get_indexes_akopcz(*a),
],
labels=["lllrnr101", "joostblack", "mustafa", "darrylg", "akopcz"],
n_range=[2 ** k for k in range(5, 21)],
xlabel="Array Length",
# More optional arguments with their default values:
# logx="auto", # set to True or False to force scaling
# logy="auto",
equality_check=None, #np.allclose, # set to None to disable "correctness" assertion
# show_progress=True,
# target_time_per_measurement=1.0,
# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units
# relative_to=1, # plot the timings relative to one of the measurements
# flops=lambda n: 3*n, # FLOPS plots
)
What you are doing is O(nlogn) which is decent enough.
If you want, you can do it in O(n) by iterating on both arrays with two pointers and since they are sorted, increase the pointer for the array with smaller object.
See below:
x = [1, 2, 8, 11, 15]
y = [1, 8, 15, 17, 20, 21]
def get_indexes(x,y):
ans = []
i=0
j=0
while (i<len(x) and j<len(y)):
if x[i] == y[j]:
ans.append([i,j])
i += 1
j += 1
elif (x[i]<y[j]):
i += 1
else:
j += 1
return ans
print(get_indexes(x,y))
which gives me:
[[0, 0], [2, 1], [4, 2]]
Although, this function will search for all the occurances of x[i] in the y array, if duplicates are not allowed in y it will find x[i] exactly once.
def get_indexes(x, y):
return [
[i, j]
for i, nr in enumerate(x)
for j in np.where(nr == y)[0]
]
You can use numpy.searchsorted:
def get_indexes(x, y):
indexes = []
for idx,val in enumerate(x):
idy = np.searchsorted(y,val)
if y[idy]==val:
indexes.append([idx,idy])
return indexes
One solution is to first look from x's side to see what values are included in y by getting their indices through np.isin and np.flatnonzero, and then use the same procedure from the other side; but instead of giving x entirely, we give only the (already found) intersected elements to gain time:
indices_in_x = np.flatnonzero(np.isin(x, y)) # array([0, 2, 4])
indices_in_y = np.flatnonzero(np.isin(y, x[indices_in_x])) # array([0, 1, 2])
Now you can zip them to get the result:
result = list(zip(indices_in_x, indices_in_y)) # [(0, 0), (2, 1), (4, 2)]
import numpy as np
training_set = np.array([[0, 1, 0, 1, 0, 1],[0, 0, 0, 1, 0, 0],[0, 0, 0, 0, 1, 0],[1, 0, 1, 0, 1, 0],[0, 1, 1, 1, 0, 1],[0, 1, 0, 0, 1, 1],[1, 1, 1, 0, 0, 0],[1, 1, 1, 1, 0, 1],[0, 1, 1, 0, 1, 0],[1, 1, 0, 0, 0, 1],[1, 0, 0, 0, 1, 0]])
def p(X):
Fx = X[:,X.shape[1]-1]
x0= 0
x1= 0
for i in range(len(Fx)):
if Fx[i-1] == 1:
x0 = x0+1
else:
x1 = x1+1
P0 = x0/len(Fx)
P1 = x1/len(Fx)
return(P0,P1)
def H(X):
result = -p(X)[0]*np.log(p(X)[0])-p(X)[1]*np.log(p(X)[1]) #needs to be log2
print("1 = pure, 0 = unpure 1/2 = decision can be random: Calculating Entropy: -" + str(p(X)[0]) + "*" + str(np.log(p(X)[0])) + "-" + str(p(X)[1]) + "*" + str(np.log(p(X)[1])) )
return result
def Q(X,i):
Xi = X[:,i]
result0= 0
result1= 0
for j in range(len(Xi)):
if Xi[j] == 1:
result1 = result1 + len(X[i,:])
else: result0 = result0 + len(X[i,:])
result1 = result1/len(X)
result0 = result0/len(X)
return(result0,result1)
def X_column(X,i,v):
list = X[np.where(X[:,i] == v)]
return list
def IG(X,i):
result = H(X)-Q(X,i)[0]*H(X_column(X,i,0))-Q(X,i)[1]*H(X_column(X,i,1))
return result
#To teach decision trees on learning set S, we will used following algorithm(ID3):
# 1. There is example set S
# 2. If |{f(x) : (x, f(x)) ∈ S}| = 1= 1 create leaf with label f(x)
# 3. For i = 1,2,...,n calculate value IG(S,i)
# 4. May j be an index o fthe biggest of calculated values
# 5. Set node with label Xj
# 6. For subsets:
# S0 = {(x, f(x)) ∈ S : xj = 0}
# and
# S1 = {(x, f(x)) ∈ S : xj = 1}
# run algorithm recurrent (for S ← S0 i S ← S1) and add new nodes as a childs for a node with label j
def ID3(S, recursion = 0, label = 0, tree = np.array()):
result = np.array()
recursion += 1
rows = S.shape[0]
columns = S.shape[1]
if S[:,columns-1] == True:
tree[recursion]= S[0,columns-1]
break
for i in range(rows):
result[i]= IG(S,i)
j = result.max()
tree[recursion]= 1 #czym jest etykieta xj
S0 = X_column(S,i,0)
tree[recursion+1] = ID3(S0,recursion = recursion )
S1 = X_column(S,i,1)
tree[recursion+2] = ID3(S1,recursion = recursion)
return tree
def pruning():
return tree
I have been working on implementing ID3 algorithm(decision tree), but I have no idea how to solve the recurrence. I've also translated algorithm steps from my laboratory list. Most of the necessary functions are already written, but I just can't grasp recurrency concept at that advanced level. And most of the tutorials are very trivial.
I am trying to implement a simple neural net. I want to print the initial pattern, weights, activation. I then want it to print the learning process (i.e. every pattern it goes through as it learns). I am as yet unable to do this - it returns the initial and final pattern (whn I put print p in appropriate places), but nothing else. Hints and tips appreciated - I'm a complete newbie to Python!
#!/usr/bin/python
import random
p = [ [1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1] ] # pattern I want the net to learn
n = 5
alpha = 0.01
activation = [] # unit activations
weights = [] # weights
output = [] # output
def initWeights(n): # set weights to zero, n is the number of units
global weights
weights = [[[0]*n]*n] # initialised to zero
def initNetwork(p): # initialises units to activation
global activation
activation = p
def updateNetwork(k): # pick unit at random and update k times
for l in range(k):
unit = random.randint(0,n-1)
activation[unit] = 0
for i in range(n):
activation[unit] += output[i] * weights[unit][i]
output[unit] = 1 if activation[unit] > 0 else -1
def learn(p):
for i in range(n):
for j in range(n):
weights += alpha * p[i] * p[j]
You have a problem with the line:
weights = [[[0]*n]*n]
When you use*, you multiply object references. You are using the same n-len array of zeroes every time. This will cause:
>>> weights[0][1][0] = 8
>>> weights
[[[8, 0, 0], [8, 0, 0], [8, 0, 0]]]
The first item of all the sublists is 8, because they are one and the same list. You stored the same reference multiple times, and so modifying the n-th item on any of them will alter all of them.
this the line is where you get :
"IndexError: list index out of range"
output[unit] = 1 if activation[unit] > 0 else -1
because output = [] , you should do output.append() or ...