I am implementing a Hierarchy clustering algorithm(with similarity) using python 3.6, the following doing is basically build new empty graph ,and keep connecting the the group(represent by list here ) with largest similarity on original recursively
the code in position 1 of code ,I want to return the best partition, however the function return is exactly the same as comminity_list,it looks like best_partition = comminity_list. make best_partition point to the address of 'comminity_list' how come it happens, what I got wrong here? how should I fix that ?
def pearson_clustering(G):
H = nx.create_empty_copy(G). # build a empty copy of G(no vetices)
best = 0 #for current modularity
current =0 #for best modualarty
A = nx.adj_matrix(G). #get adjacent matrix
org_deg =deg_dict(A, G.nodes()) # degree of G
org_E = G.number_of_edges(). # number of edges of G
comminity_list = intial_commnity_list(G) # function return a list of lists here
best_partition = None
p_table =pearson_table(G) #pearson_table return a dictionary of each pair Pearson correlation
l = len(comminity_list)
while True:
if(l == 2): break
current = modualratiry(H,org_deg,org_E) #find current modularity
l = len(comminity_list)
p_build_cluster(p_table,H,G,comminity_list) #building clustering on H
if(best < current):
best_partition = comminity_list. #postion1
best = current #find the clustering with largest modularity
return best_partition #postion2
it looks like best_partition = comminity_list. make best_partition point to the address of 'comminity_list' how come it happens, what I got wrong here? how should I fix that ?
That is just python's implicit assignment behaviour. When you do "best_partition = comminity_list" you just assign comminity_list to the same address as best_partition.
If you want to explicitly copy the list you can use this (which replaces the list best_partition with the comminity_list):
best_partition[:] = comminity_list
or the copy function. If comminity_list has sublists you will need the deepcopy function instead, from the same module (otherwise you will get a copy of the original list, but the sublists will still be just address references).
best_partition = comminity_list.copy
Related
I want to change a parameters in a list of elements QF and QD as following:
Lattice is a list includes these elements for example :
lattice = [QF, QD, QF, QD]
And the indexes of these elements in the list are given in quad_indexes.
QF and QD has two parameters (K and FamName):
QF.K = 1.7 and QD.K = -2.1
QF.FamName = 'QF'
QD.FamName = 'QD'
I want to gave the random value for the parameter K for each element individually
I tried:
i =0
Quad_strength_err =[]
while(i < len(quad_indexes)):
if(lattice[quad_indexes[i]].FamName == 'QF'):
lattice[quad_indexes[i]].K *= (1 + errorQF*random())
i += 1
elif(lattice[quad_indexes[i]].FamName == 'QD'):
lattice[quad_indexes[i]].K *= (1 + errorQD * random())
i += 1
for j in quad_indexes:
quad_strength_err = lattice[j].K
Quad_strength_err.append(quad_strength_err)
The problem is when i print (Quad_strength_err) i got fixed value for each of QF and QD, for example:
[1.8729820159805597, -2.27910323371567, 1.8729820159805597, -2.27910323371567]
I am looking foe a result for example:
[1.7729820159805597, -2.17910323371567, 1.8729820159805597, -2.27910323371567]
TL;DR
You need to make copies of QF and QD - you're making aliases.
The problem
The problem is almost certainly due to aliasing.
When you initialize lattice with this line:
lattice = [QF, QD, QF, QD]
what you are doing is creating a structure with two pointers to QF and two pointers to QD.
In your loop, you then modify QF twice, once via lattice[0] and again via lattice[2], and ditto for QD.
The solution
What you need to do is create copies, maybe shallow, maybe deep.
Try this option:
from copy import copy
lattice = [copy(QF), copy(QD), copy(QF), copy(QD)]
and if that still doesn't work, you might need a deepcopy:
from copy import deepcopy
lattice = [deepcopy(QF), deepcopy(QD), deepcopy(QF), deepcopy(QD)]
Or, a more compact version of the same code, just cause I like comprehensions:
from copy import deepcopy
lattice = [deepcopy(x) for x in (QF, QD, QF, QD)]
If your code is correct, you can't expect a different result for your first and third element of lattice (resp second and last), since it's (to simplify) the same element.
Using id, you can easily show that, lattice[0] and lattice[2] share the same id, therefore modifying lattice[0] will modify lattice[2].
You can duplicate each object QF and QD to git rid of this behavior
I tryed to build a working sample of your problem starting by building simple QF and QD classes :
from random import random
class QF():
def __init__(self):
self.K = 1.7
self.FamName = "QF"
class QD():
def __init__(self):
self.K = -2.1
self.FamName = "QD"
then I create the lattice with different instance of the classis by calling them with ()
lattice = [QF(), QD(), QF(), QD()]
I think your mistake comes from this step as QF refers to the class it self and QF creats a brand new instance that you can modify separatly from the other one. For example if you do QF.K = 3 and then a = QF(), a.K should return you 3.
finally I apply some randomness using the random imported previously :
for i in lattice:
if i.FamName == "QF":
i.K = (1 + errorQF*random())
elif i.FamName == "QD":
i.K = (1 + errorQD*random())
form which I get :
print(*[i.K for i in lattice])
>>>> 1.148989048860669 0.9324164812782919 1.0652187255939742 0.6860911849022292
I am having a problem with my genetic feature optimization algorithm that I am attempting to build. The idea is that a specific combination of features will be tested and if the model accuracy using those features is higher than the previous maximum, then the combination of features replaces the previous maximum combination. through running through the remaining potential features in this way, the final combination should be the optimal combination of features given the feature vector size. Currently, the code that looks to achieve this looks like:
def mutate_features(features, feature):
new_features = features
index = random.randint(0,len(features)-1)
new_features[index] = feature
return new_features
def run_series(n, f_list, df):
features_list = []
results_list = []
max_results_list = [[0,0,0,0,0]]
max_feature_list = []
features = [0,0,0,0,1]
for i in range(0,5): # 5 has just been chosen as the range for testing purposes
results = run_algorithm(df, f_list, features)
features_list.append(features)
results_list.append(results)
if (check_result_vector(max_results_list, results)):
max_results_list.append(results)
max_feature_list.append(features)
else:
print("Revert to previous :" +str(max_feature_list[-1]))
features = max_feature_list[-1]
features = mutate_features(features, f_list[i])
print("Feature List = " +str(features_list))
print("Results List = " +str(results_list))
print("Max Results List = " +str(max_results_list))
print("Max Feature List = "+str(max_feature_list))
The output from this code has been included below;
Click to zoom or enlarge the photo
The section that I do not understand is the output of the max_feature_list and feature_list.
If anything is added through the use of .append() to the max_feature_list or the feature_list inside the for loop, it seems to change all items that are already members of the list to be the same as the latest addition to the list. I may not be fully understanding of the syntax/logic around this and would really appreciate any feedback as to why the program is doing this.
It happens because you change the values of features inside mutate_features function and then, since the append to max_feature_list is by reference, the populated values in max_feature_list are changing too because their underlying value changed.
One way to prevent such behaviour is to deepcopy features inside mutate_features, mutate the copied features as you want and then return it.
For example:
from copy import deepcopy
def mutate_features(features, feature):
new_features = deepcopy(features)
index = random.randint(0,len(features)-1)
new_features[index] = feature
return new_features
features = [1, 2, 3]
res = []
res.append(features)
features = mutate_features(features, feature)
res.append(features)
print(res)
I've implemented a program on python which generates random binary trees. So now I'd like to assign to each internal node of the tree a distance to make it ultrametric. Then, the distance between the root and any leaves must be the same. If a node is a leaf then the distance is null. Here is a node :
class Node() :
def __init__(self, G = None , D = None) :
self.id = ""
self.distG = 0
self.distD = 0
self.G = G
self.D = D
self.parent = None
My idea is to set the distance h at the beginning and to decrease it as an internal node is found but its working only on the left side.
def lgBrancheRand(self, h) :
self.distD = h
self.distG = h
hrandomD = round(np.random.uniform(0,h),3)
hrandomG = round(np.random.uniform(0,h),3)
if self.D.D is not None :
self.D.distD = hrandomD
self.distD = round(h-hrandomD,3)
lgBrancheRand(self.D,hrandomD)
if self.G.G is not None :
self.G.distG = hrandomG
self.distG = round(h-hrandomG,3)
lgBrancheRand(self.G,hrandomG)
In summary, you would create random matrices and apply UPGMA to each.
More complete answer below
Simply use the UPGMA algorithm. This is a clustering algorithm used to resolve a pairwise matrix.
You take the total genetic distance between two pairs of "taxa" (technically OTUs) and divide it by two. You assign the closest members of the pairwise matrix as the first 'node'. Reformat the matrix so these two pairs are combined into a single group ('removed') and find the next 'nearest neighbor' ad infinitum. I suspect R 'ape' will have a ultrametric algorhithm which will save you from programming. I see that you are using Python, so BioPython MIGHT have this (big MIGHT), personally I would pipe this through a precompiled C program and collect the results via paup that sort of thing. I'm not going to write code, because I prefer Perl and get flamed if any Perl code appears in a Python question (the Empire has established).
Anyway you will find this algorhithm produces a perfect ultrametric tree. Purests do not like ultrametric trees derived throught this sort of algorithm. However, in your calculation it could be useful because you could find the phylogeny from real data , which is most "clock-like" against the null distribution you are producing. In this context it would be cool.
You might prefer to raise the question on bioinformatics stackexchange.
So i am trying to define an exact shape (outter layer) of a building from .osm node references because I need to create more detail structure inside it (rooms, walls) based of some assumption.
Until now I've extracted the node's coordinate from its reference of 'building:part' using pyosmium, stored the node's coordinates to a list of tuples, reconstructed it using Polygon function from shapely and plotted it using mplleaflet. But somehow the nodes from the reference aren't sorted and when I try to plot it, there are a lot of intersections shown.
My current method to solve this sorting problem is following:
def distance(current_data, x_point):
# find the shortest distance between x_point and all points in current data
# check if it's perpendicular to the line built before
return current_data[x] # which perpendicular and has shortest distance to x_point
def sort_nodes(nodes):
temp = []
for node in nodes:
if len(temp) < 2: # adding first 2 points as a starting line
temp.append(node)
else:
n = distance(temp, node)
# find index of current_data[x] in temp
return temp.insert(min_index, node)
sorting the tuples of coordinates only by its distance (shortest) still doesn't solve the problem. And even sorting it based on its degree could lead to another problem which not all building is rectengular in shape.
so this is how I got so far by sorting based on distance.plotted image
Is there any better way to do this? Or did I do it wrongly? I've tried this for 2 days straight. I am sorry if it's to trivial but i am really new with coding and need to make this done. Thanks for the help.
edit: answer to scai
here's my following method extracting the nodes:
import osmium as osm
def way_filter():
class WayFilter(osm.SimpleHandler):
def __init__(self):
super(WayFilter, self).__init__()
self.nodes = []
def way(self, w):
if 'building:part' in w.tags and w.tags['building:part'] == 'hospital':
temp = []
for n in w.nodes:
temp.append(n.ref)
self.nodes.append(temp)
ways = WayFilter()
ways.apply_file(map)
return ways.nodes
def get_node(ref_node):
class ObjectCounterHandler(osm.SimpleHandler):
def __init__(self):
osm.SimpleHandler.__init__(self)
self.location = []
self.ref = ref_node
def write_object(self, lon, lat):
self.location.append([lon, lat])
def node(self, n):
try:
if any(n.id in sublist for sublist in self.ref):
self.write_object(n.location.lon, n.location.lat)
except TypeError:
if n.id in self.ref:
self.write_object(n.location.lon, n.location.lat)
h = ObjectCounterHandler()
h.apply_file(map)
return h.location
Main Program
a = way_filter()
for ref in a:
b = get_node(ref)
c = next(colors)
loc = []
for x in b:
loc.append(tuple(x))
# plot the points
polygons = Polygon(loc)
x,y = polygons.exterior.xy
plt.plot(x,y, zorder=1)
mplleaflet.show()
and here is the result without sorting. plot image without sort
Nodes are referenced by ways in the correct order, i.e. so that they are adjacent to each other. You don't need to perform any manual sorting if you read the list of referenced node IDs correctly. Manual sorting is only required for relation elements.
Unfortunately I'm not familiar with pyosmium so I can't tell you what's wrong with your code.
I'm writing a recursive breadth-first traversal of a network. The problem I ran into is that the network often looks like this:
1
/ \
2 3
\ /
4
|
5
So my traversal starts at 1, then traverses to 2, then 3. The next stop is to proceed to 4, so 2 traverses to 4. After this, 3 traverses to 4, and suddenly I'm duplicating work as both lines try to traverse to 5.
The solution I've found is to create a list called self.already_traversed, and every time a node is traversed, I append it to the list. Then, when I'm traversing from node 4, I check to make sure it hasn't already been traversed.
The problem here is that I'm using an instance variable for this, so I need a way to set up the list before the first recursion and a way to clean it up afterwards. The way I'm currently doing this is:
self.already_traversed = []
self._traverse_all_nodes(start_id)
self.already_traversed = []
Of course, it sucks to be twoggling variables outside of the function that's using them. Is there a better way to do this so this can be put into my traversal function?
Here's the actual code, though I recognize it's a bit dense:
def _traverse_all_nodes(self, root_authority, max_depth=6):
"""Recursively build a networkx graph
Process is:
- Work backwards through the authorities for self.cluster_end and all
of its children.
- For each authority, add it to a networkx graph, if:
- it happened after self.cluster_start
- it's in the Supreme Court
- we haven't exceeded a max_depth of six cases.
- we haven't already followed this path
"""
g = networkx.Graph()
if hasattr(self, 'already_traversed'):
is_already_traversed = (root_authority.pk in self.visited_nodes)
else:
# First run. Create an empty list.
self.already_traversed = []
is_already_traversed = False
is_past_max_depth = (max_depth <= 0)
is_cluster_start_obj = (root_authority == self.cluster_start)
blocking_conditions = [
is_past_max_depth,
is_cluster_start_obj,
is_already_traversed,
]
if not any(blocking_conditions):
print " No blocking conditions. Pressing on."
self.visited_nodes.append(root_authority.pk)
for authority in root_authority.authorities.filter(
docket__court='scotus',
date_filed__gte=self.cluster_start.date_filed):
g.add_edge(root_authority.pk, authority.pk)
# Combine our present graph with the result of the next
# recursion
g = networkx.compose(g, self._build_graph(
authority,
max_depth - 1,
))
return g
def add_clusters(self):
"""Do the network analysis to add clusters to the model.
Process is to:
- Build a networkx graph
- For all nodes in the graph, add them to self.clusters
"""
self.already_traversed = []
g = self._traverse_all_nodes(
self.cluster_end,
max_depth=6,
)
self.already_traversed = []
Check out:
How do I pass a variable by reference?
which contains an example on how to past a list by reference. If you pass the list by reference, every call to your function will refer to the same list.