Propagate ratings in a graph structure

Propagate ratings in a graph structure - python

I have the following problem:
Consider a weighted direct graph.
Each node has a rating and the weighted edges represents
the "influence" of a node on its neighbors.
When a node rating change, the neighbors will see their own rating modified (positively or negatively)
How to propagate a new rating on one node?
I think this should be a standard algorithm but which one?
This is a general question but in practice I am using Python ;)
Thanks
[EDIT]
The rating is a simple float value between 0 to 1: [0.0,1.0]
There is certainly a convergence issue: I want just limit the propagation to a few iteration...

There is an easy standard way to do it as follows:
let G=(V,E) be the graph
let w:E->R be a weight function such that w(e) = weight of edge e
let A be an array such that A[v] = rating(v)
let n be the required number of iterations
for i from 1 to n (inclusive) do:
for each vertex v in V:
A'[v] = calculateNewRating(v,A,w) #use the array A for the old values and w
A <- A' #assign A with the new values which are stored in A'
return A
However, for some cases - you might have better algorithms based on the features of the graph and how the rating for each node is recalculated. For example:
Assume rating'(v) = sum(rating(u) * w(u,v)) for each (u,v) in E, and you get a variation of Page Rank, which is guaranteed to converge to the principle eigenvector if the graph is strongly connected (Perron-Forbenius theorem), so calculating the final value is simple.
Assume rating'(v) = max{ rating(u) | for each (u,v) in E}, then it is also guaranteed to converge and can be solved linearly using strongly connected components. This thread discusses this case.

Related

Find "hubs" in a text-data correlation matrix from fuzzywuzzy

If I have a list of strings, how do I select some 'representative' strings such that between them, they can fuzzy match with all of the strings in the list.
The first step, fuzzy matching all the texts has been done and it looks like this
My idea is to select two or three strings that can act as a representative for the whole set such that if I fuzzy match, I can flag all of them as 1 with a >80 threshold.
Is there a way I can do it?

First, build the binary adjacency matrix of the graph A_ij=1 if fuzzymatch(i,j)>80, 0 otherwise. If you don't care about having a minimal set, you can use a greedy algorithm
#V is wordset nodes, initialize Hubset
Hubset = {}
while {x for x in V x not in Hubset and sum([A[x][j] for j in Hubset]) == 0}:
#Choose "most central node" c in V(e.g. one with most connections to elements not already connected to Hubset)
Hubset.add(c)
return Hubset
The minimal set problem can be formulated as a linear integer program, you can use a MLP solver for this:
Variables: x_1...x_n (x_i 1 if i in hub, 0 if not in hub)
minimize sum_i x_i
subject to
sum_i A[i][j]x_i>=1 for all j (j constraints)
See the networkx documentation for some measures of graph centrality

Checking if a graph contain a given induced subgraph

I'm trying to detect some minimal patterns with properties in random digraphs. Namely, I have a list called patterns of adjacency matrix of various size. For instance, I have [0] (a sink), but also a [0100 0001 1000 0010] (cycle of size 4), [0100, 0010, 0001, 0000] a path of length 3, etc.
When I generate a digraph, I compute all sets that may be new patterns. However, in most of the case it is something that I don't care about: for instance, if the potential new pattern is a cycle of size 5, it does not teach me anything because it has a cycle of length 3 as an induced subgraph.
I suppose one way to do it would look like this:
#D is the adjacency matrix of a possible new pattern
new_pattern = True
for pi in patterns:
k = len(pi)
induced_subgraphs = all_induced_subgraphs(D, k)
for s in induced_subgraphs:
if isomorphic(s, pi):
new_pattern = False
break
where all_induced_subgraphs(D,k) gives all possible induced subgraphs of D of size k, and isomorphic(s,pi) determines if s and pi are isomorphic digraphs.
However, checking all induced subgraphs of a digraph seems absolutely horrible to do. Is there a clever thing to do there?

Thanks to #Stef I learned that this problem has a name
and can be solved using on netwokx with a function described on this page.
Personally I use igraph on my project so I will use this.

Find shortest triangle in a graph

I have a set of points and need to select an optimal subset of 3 of them, where the criterion is a linear sum of some properties of the points, and some properties of pairs of the points.
In Python, this is quite easy using itertools.combinations:
all_points = combinations(points, 3)
costs = []
for i, (p1, p2, p3) in enumerate(all_points):
costs.append((p1.weight + p2.weight + p3.weight
+ pair_weight(p1, p2) + pair_weight(p1, p3) + pair_weight(p2, p3),
i))
costs.sort()
best = all_points[costs[0][1]]
The problem is that this is a brute force solution, requiring to enumerate all possible combinations of 3 points, which is O(n^3) in the number of points and therefore easily leads to a very large number of evaluations to perform. I have been trying to research whether there is a more efficient way to do this, perhaps taking advantage of the linearity of the cost function.
I have tried turning this into a networkx graph featuring node and edge weights. However, I have not yet found an algorithm in that toolkit that can calculate the "shortest triangle", particularly one that considers both edge and node weights. (Shortest path algorithms tend to only consider edge weights for example.)
There are functions to enumerate all cliques, and then I can select 3-cliques, and calculate the cost, but this is also brute force and therefore not better than doing it with combinations as above.
Are there any other algorithms I can look at?
By the way, if I do not have the edge weights, it is easy to just sort the nodes by their node-weight and choose the first three. So it is really the paired costs that add complexity to this problem. I am wondering if somehow I can just list all pairs and find the top-k of those that form triangles, or something better? At least if I could efficiently enumerate top candidates and stop the enumeration on some heuristic, it might be better than the brute force approach.

From now on, I will use n as the number of nodes and m as the number of edges. If your graph is fully connected, then m is just n choose 2. I'll also disregard node weights, because as the comments to your initial post have noted, the node weights can be absorbed into the edges they're connected to.
Your algorithm is O(n^3); it's hopefully not too hard to see why: You iterate over every possible triplet of nodes. However, it is possible to iterate over every triangle in a graph in O(m sqrt(m)):
for every node u:
for every node v adjacent to u:
if degree(u) < degree(v): continue;
for every node w adjacent to v:
if degree(v) < degree(w): continue;
if u is not connected to w: continue;
// <u,v,w> is a triangle!
The proof for this algorithm's runtime of O(m sqrt(m)) is nontrivial, so I'll direct you here: https://cs.stanford.edu/~rishig/courses/ref/l1.pdf
If your graph is fully connected, then you've gotta stick with the O(n^3), I think. There might be some early-pruning ideas you can do but they won't lead to a significant speedup, probably 2x at very best.

How to implement a cost minimization objective function correctly in Gurobi?

Given transport costs, per single unit of delivery, for a supermarket from three distribution centers to ten separate stores.
Note: Please look in the #data section of my code to see the data that I'm not allowed to post in photo form. ALSO note while my costs are a vector with 30 entries. Each distribution centre can only access 10 costs each. So DC1 costs = entries 1-10, DC2 costs = entries 11-20 etc..
I want to minimize the transport cost subject to each of the ten stores demand (in units of delivery).
This can be done by inspection. The the minimum cost being $150313. The problem being implementing the solution with Python and Gurobi and producing the same result.
What I've tried is a somewhat sloppy model of the problem in Gurobi so far. I'm not sure how to correctly index and iterate through my sets that are required to produce a result.
This is my main problem: The objective function I define to minimize transport costs is not correct as I produce a non-answer.
The code "runs" though. If I change to maximization I just get an unbounded problem. So I feel like I am definitely not calling the correct data/iterations through sets into play.
My solution so far is quite small, so I feel like I can format it into the question and comment along the way.
from gurobipy import *
#Sets
Distro = ["DC0","DC1","DC2"]
Stores = ["S0", "S1", "S2", "S3", "S4", "S5", "S6", "S7", "S8", "S9"]
D = range(len(Distro))
S = range(len(Stores))
Here I define my sets of distribution centres and set of stores. I am not sure where or how to exactly define the D and S iteration variables to get a correct answer.
#Data
Demand = [10,16,11,8,8,18,11,20,13,12]
Costs = [1992,2666,977,1761,2933,1387,2307,1814,706,1162,
2471,2023,3096,2103,712,2304,1440,2180,2925,2432,
1642,2058,1533,1102,1970,908,1372,1317,1341,776]
Just a block of my relevant data. I am not sure if my cost data should be 3 separate sets considering each distribution centre only has access to 10 costs and not 30. Or if there is a way to keep my costs as one set but make sure each centre can only access the costs relevant to itself I would not know.
m = Model("WonderMarket")
#Variables
X = {}
for d in D:
for s in S:
X[d,s] = m.addVar()
Declaring my objective variable. Again, I'm blindly iterating at this point to produce something that works. I've never programmed before. But I'm learning and putting as much thought into this question as possible.
#set objective
m.setObjective(quicksum(Costs[s] * X[d, s] * Demand[s] for d in D for s in S), GRB.MINIMIZE)
My objective function is attempting to multiply the cost of each delivery from a centre to a store, subject to a stores demand, then make that the smallest value possible. I do not have a non zero constraint yet. I will need one eventually?! But right now I have bigger fish to fry.
m.optimize()
I produce a 0 row, 30 column with 0 nonzero entries model that gives me a solution of 0. I need to set up my program so that I get the value that can be calculated easily by hand. I believe the issue is my general declaring of variables and low knowledge of iteration and general "what goes where" issues. A lot of thinking for just a study exercise!
Appreciate anyone who has read all the way through. Thank you for any tips or help in advance.

Your objective is 0 because you do not have defined any constraints. By default all variables have a lower bound of 0 and hence minizing an unconstrained problem puts all variables to this lower bound.
A few comments:
Unless you need the names for the distribution centers and stores, you could define them as follows:
D = 3
S = 10
Distro = range(D)
Stores = range(S)
You could define the costs as a 2-dimensional array, e.g.
Costs = [[1992,2666,977,1761,2933,1387,2307,1814,706,1162],
[2471,2023,3096,2103,712,2304,1440,2180,2925,2432],
[1642,2058,1533,1102,1970,908,1372,1317,1341,776]]
Then the cost of transportation from distribution center d to store s are stored in Costs[d][s].
You can add all variables at once and I assume you want them to be binary:
X = m.addVars(D, S, vtype=GRB.BINARY)
(or use Distro and Stores instead of D and S if you need to use the names).
Your definition of the objective function then becomes:
m.setObjective(quicksum(Costs[d][s] * X[d, s] * Demand[s] for d in Distro for s in Stores), GRB.MINIMIZE)
(This is all assuming that each store can only be delivered from one distribution center, but since your distribution centers do not have a maximal capacity this seems to be a fair assumption.)
You need constraints ensuring that the stores' demands are actually satisfied. For this it suffices to ensure that each store is being delivered from one distribution center, i.e., that for each s one X[d, s] is 1.
m.addConstrs(quicksum(X[d, s] for d in Distro) == 1 for s in Stores)
When I optimize this, I indeed get an optimal solution with value 150313.

How would I use an Artificial Intelligence algorithm to a program for it to learn and assign appropriate weight values?

I have a program I am writing and I am wondering how I would use some AI algorithm for my program so that it can learn and assign appropriate weight values to my fields.
For example I have fields a, b, c, d, and e. Each of these fields would have different weights because field a is more valuable than d. I was wondering how I would go about doing this so I can normalize my values and use a sum of these values to compare.
Example:
Weight of a = 1
Weight of b = 2
Weight of c = 3
Weight of d = 4
Weight of e = 5
For the sum, multiply each field's value with its assigned weight:
Result = (value of a) * 1 + (value of b) * 2 + (value of c) * 3 + (value of d) * 4 + (value of e) * 5
I am looking to input some training data and train my program to learn and compare the a,b,c,d,e values possessed by each object so that it can assign weights to each one.
EDIT: I am just looking for the method to approach this, whether it be by using neural nets, or some other means to learn and assign weights to these fields.

The best way to do this depends a lot on what kind of a program you're writing. How do you assess how good of an answer result is?
If result can either be correct or incorrect in a categorical way, then a neural net would be a great option. You could use a two-layer topology (i.e. all of the input nodes are connected to each output node, with no layer in between), and have each input node correspond to one of your fields (a, b, c, etc.). You can then use backpropagation to train the network such that each set of field values maps to the correct category. The edge weights that you end up with at the end will be the weights to associate with each field.
However, if result can either be either more or less accurate in some sort of a continuous way, a genetic algorithm is probably a better solution. This could be the case if you're comparing result to some ideal value, if you're using the weights in some sort of function with an evaluatable outcome (like a game), or some other similar situation. The fitness function that you use will depend on your exact circumstances (for the examples above you might use proximity to the ideal value, or win-loss ratio when playing the game with those values). There are a variety of ways that you could format the candidate solutions:
One option would be to use a sequence of bitstrings representing each weight in binary. Mutations could flip any bit and crossover could occur at any point along the string (or you could only allow it to occur between numbers)
If you want to allow for floating point values, though, you might be
better off using a system where each candidate solution is a list of weights. Mutations can add or subtract from a given weight an crossover can occur at any point within the list.
If you want to provide more information on what specifically your program is trying to accomplish, I can try to offer a more specific suggestion.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.