I am rather new to Python and NetworkX. I need to create a list similar to Edgelist=[(0,1),(0,3),(1,0),(1,2),(1,4),(2,1),(2,5)], which elements represent the starting and ending node of an edge (link) that is in turn part of a network.
Rather than setting them manually, I want Python to create the couples you see in the list by randomly selecting the integer values of (start,end) from an assigned range of values (namely, 0, 999), which represent the node IDs. Then, I want to make sure that every node ID is included at least once in the series of (start,end) values (this means that all my nodes will be connected to at least one other node).
I know I could use random.randint(0, 999) but I don't know how to "nest" it into the creation of a list (perhaps a for loop?). I wish I had some code to show you but this is my first attempt at working with NetworkX!
EDIT
To give you a visual idea of what I mean, here are two images. The first is a regular network (aka lattice), and the second is a random one. The edge list of the first was created manually in order to reproduce a chess table, while the second displays an edge list which is a (manually) shuffled counterpart of the first one. As you see, the nodes are kept in exactly the same locations. Hope this helps a bit more. Thanks!
There is a similar answer but for a complete graph on - How to generate a fully connected subgraph from node list using python's networkx module
In your case, using zubinmehta's answer:
import networkx
import itertools
def complete_graph_from_list(L, create_using=None):
G = networkx.empty_graph(len(L),create_using)
if len(L)>1:
if G.is_directed():
edges = itertools.permutations(L,2)
else:
edges = itertools.combinations(L,2)
G.add_edges_from(edges)
return G
You could build the graph as:
S = complete_graph_from_list(map(lambda x: str(x), range(0,1000)))
print S.edges()
Here is a networkx command that will create a graph such that each node has exactly one edge:
import networkx as nx
G = nx.configuration_model([1]*1000)
If you look into the guts of it, it does the following which answers your question - each node will appear in exactly one edge.
import random
mylist = random.suffle(range(start,end))
edgelist = []
while mylist:
edgelist.append((mylist.pop(),mylist.pop()))
You should guarantee that mylist has even length before going through the popping.
Python has inbuilt library called itertools.
Sample as below as how you achieve what you mentioned:
import itertools
list = [3, 4, 6, 7]
sublist_length = 2
comb = itertools.combinations(list, sublist_length)
This will return comb as an iterator.
You can do comb.next() to get next element in the iterator or iterate over a for loop to get all results as you wanted as below.
for item in comb:
print item
which should output:
(3, 4),
(3, 6),
(3, 7),
(4, 6),
(4, 7),
(6, 7),
I hope this will solve your problem.
For the list creation you can do something like:
import random
max = 999
min = 0
original_values = range(min, max) # could be arbitrary list
n_edges = # some number..
my_edge_list = [(random.choice(original_values), random.choice(original_values))
for _ in range(n_edges)]
To assert you have all values in there you can do the following
vals = set([v for tup in my_edge_list for v in tup])
assert all([v in vals for v in original_values])
The assert will make sure you have the proper representation in your edges. As far as doing your best to make sure you don't hit that assert you can do a couple of things.
Sample without replacement from your list of integers until they are all gone to create a "base network" and then randomly add on more to your hearts desire
Make n_edges sufficiently high that it's very likely your condition will be met. If it's not try again...
Really depends on what you're going to use the network for and what kind of structure you want it to have
EDIT: I have updated my response to be more robust to an arbitrary list of values rather than requiring a sequential list
random.seed(datetime.datetime.now())
from random import randint
# ot generate 100 tuples with randints in range 0-99
li = [(randint(0,99),randint(0,99)) for i in range(100)]
print(li)
[(80, 55), (3, 10), (66, 65), (26, 23), (8, 72), (83, 25), (24, 99), (72, 9), (52, 76), (72, 68), (67, 25), (72, 18), (94, 62), (7, 62), (49, 94), (29, 89), (11, 38), (52, 51), (19, 32), (20, 85), (56, 61), (4, 40), (97, 58), (82, 2), (50, 82), (77, 5), (2, 9), (2, 46), (39, 4), (74, 40), (69, 15), (1, 77), (45, 58), (80, 59), (85, 80), (27, 80), (81, 4), (22, 33), (77, 60), (75, 87), (43, 36), (60, 34), (90, 54), (75, 3), (89, 84), (51, 93), (62, 64), (81, 50), (15, 60), (33, 97), (42, 62), (83, 26), (13, 33), (41, 87), (29, 63), (4, 32), (6, 14), (79, 73), (95, 4), (41, 16), (96, 64), (15, 28), (35, 13), (35, 82), (77, 16), (63, 27), (75, 37), (11, 52), (21, 35), (37, 96), (9, 86), (83, 11), (5, 42), (34, 32), (17, 8), (65, 55), (58, 19), (90, 40), (18, 75), (29, 14), (0, 11), (25, 68), (34, 52), (22, 8), (12, 53), (16, 49), (73, 54), (78, 80), (74, 60), (40, 68), (69, 20), (37, 38), (74, 60), (53, 90), (25, 48), (44, 52), (49, 27), (28, 35), (29, 94), (35, 60)]
Here is a solution that first generates a random population of nodes (pop1), then shuffles it (pop2) and combines those into a list of pairs.
Note: this method only yields vertices where each node is exactly once start and exactly once end, so maybe not what you're after. See below for another method
import random, copy
random.seed() # defaults to time.time() ...
# extract a number of samples - the number of nodes you want
pop1 = random.sample(xrange(1000), 10)
pop2 = copy.deepcopy( pop1 )
random.shuffle( pop2 )
# generate pairs from the same population - this guarantees your constraint
pairs = zip( pop1, pop2 )
print pairs
Output:
[(17, 347), (812, 688), (347, 266), (731, 342), (342, 49), (904, 17), (49, 731), (50, 904), (688, 50), (266, 812)]
Here is another method
This allows for duplicate occurrences of the nodes.
The idea is to draw start and end nodes from the same population:
import random
random.seed()
population = range(10) # any population would do
# choose randomly from the population for both ends
# so you can have duplicates
pairs = [(random.choice(population), random.choice(population) for _ in xrange(100)]
print pairs[:10]
Output:
[(1, 9), (7, 1), (8, 6), (4, 7), (6, 2), (7, 3), (0, 2), (1, 0), (8, 3), (8, 3)]
Related
Here is my vectorized version:
import pandas as pd
import numpy as np
alpha=.001
data = [ (2, 3), (4, 7), (6, 11), (8, 17), (10, 23), (12, 31), (14, 39), (16, 49), (18, 59), (20, 71), (22, 83), (24, 97), (26, 113), (28, 131), (30, 149), (32, 169), (34, 191), (36, 214), (38, 239), (40, 266), (42, 295), (44, 326), (46, 359), (48, 394), (50, 431)]
##CREATE EXAMPLES MATRIX
x_coordinates = [x[0] for x in data]
x_coords=[]
[x_coords.append([1,x]) for x in x_coordinates]
#Creates a list of all x-coordinates with a 1 column
examples=pd.DataFrame(x_coords).transpose()
#uses that list to create a dataframe. Must transpose so it is dimsion 2,25. rows are features, columns are specific examples.
##CREATE THETA MATRIX/VECTOR
theta_list = [1, 2]
theta = pd.DataFrame(theta_list).transpose()
#creates a df of dimension 1,2.
##CREATE Y VECTOR/MATRIX
y_coordinates = [x[1] for x in data]
y=pd.DataFrame(y_coordinates).transpose()
deriv=pd.DataFrame([])
count=0
while (deriv != 0).all().all() and count <= 500000:
length=len(data)
#theta*X
thetaX=theta.dot(examples)
error=thetaX-y
error_pt2=error.dot(examples.T)
deriv=alpha*(1/length)*error_pt2
theta=theta-deriv
print(theta)
count+=1
print(count)
Here is my version made with loops:
total=0
th0=0
th1=0
alpha=0.001
deriv0=1
deriv1=1
count=0
while deriv0 and deriv1 != 0 and count<=1000000:
total0=0
total1=0
#th0
for i in data:
hyp=th0+(th1*i[0])
#print("Hyp is {}".format(hyp))
total0+=(hyp-i[1])
deriv0=(1/25)*total0
th0temp=th0-(alpha*(deriv0))
#th1
for i in data:
hyp=th0+(th1*i[0])
total1+=(hyp-i[1])*i[0]
deriv1=(1/25)*total1
th1temp=th1-(alpha*(deriv1))
th0=th0temp
th1=th1temp
th0temp=0
th1temp=0
count+=1
print("Theta 0: {} \n Theta 1: {} \n\n".format(th0,th1))
print(count)
When I run the vectorized version, it takes almost 10 times as long to run. I would have expected using vectors would make this code much more efficient than running multiple loops. What gives? Is it just the computational overhead of Pandas which is making this run slower? Maybe Pandas isn't suited for this sort of algorithm
I have a list of tuples with character spans. But there are instances where there is an overlap of the spans.
My aim is to modify the tuple list in such a way that for overlaps only the larger span values is kept and smaller span deleted
Example:
Original list: [(2, 3), (7, 9), (10, 11), (10, 12), (15, 17), (16, 17), (20, 21), (20, 29), (21, 28)]
Modified list: [(2, 3), (7, 9), (10, 12), (15, 17), (20, 29)]
Here (10,11) , (16,17) , (20,21) , (21,28) were removed as they had a bigger span overlap with (10,12) , (15,17) and (20,29) respectively.
I found some answers which deal with the overlap like this but these don't deal with the larger span thing.
My thought was to sort on length of the span difference in descending order and then search for overlaps somehow. This search for overlap is something I cannot figure out
Code
values = [(2, 3), (7, 9), (10, 11), (10, 12), (15, 17), (16, 17), (20, 21), (20, 29), (21, 28)]
outputs = []
for value in values:
flag = True
outputsLoop = outputs[:]
for output in outputsLoop:
fromVal, toVal = output
if value[0] in range(fromVal,toVal+1) or value[1] in range(fromVal,toVal+1):
if (value[1]-value[0]) > (toVal - fromVal):
outputs.remove(output)
else:
flag = False
if flag == True:
outputs.append(value)
print(outputs)
Output
[(2, 3), (7, 9), (10, 12), (15, 17), (20, 29)]
Code explanation
The code loops through the values, and checks them for each part in the checked numbers, and keeps the value with the bigger span. It is not the most efficient or the cleanist code if I had to guess, but it works. And I do believe I can probably optimize it further when I have time.
I try to find a k closest points to a target. For example, I have this set of point:
points = [(0,0), (19, 8), (23, 11), (25, 22), (30, 26), (23, 20), (30,0)]
And I want to find the k-closest point for each element of the list. I have tried this code:
def kClosest(target, points,k):
return sorted(points, key=lambda x:(target[0]-x[0])**2+(target[1]-x[1])**2)[:k]
for i in points:
points_copy = points.copy()
points_copy.remove(i)
print(i, ": ", kClosest(i,points_copy,3))
The output will be like this (which I expect):
(0, 0) : [(19, 8), (23, 11), (30, 0)]
(19, 8) : [(23, 11), (23, 20), (30, 0)]
(23, 11) : [(19, 8), (23, 20), (25, 22)]
(25, 22) : [(23, 20), (30, 26), (23, 11)]
(30, 26) : [(25, 22), (23, 20), (23, 11)]
(23, 20) : [(25, 22), (23, 11), (30, 26)]
(30, 0) : [(23, 11), (19, 8), (23, 20)]
Now I want to modify the output to result in the index of the points. So, I hope the output for point (0,0) is [1, 2, 6]. How can we modify the code to produce the expected output?
Take a look at the Neighbors module from sklearn:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.neighbors
This will provide you optimized methods to do this.
I also think that you should use some optimized module like the one from Sklearn as stated in #aurelien_morel answer.
In any case a very simple solution for what you are asking is to work with indexes of the list points and use them to get and order by the values and return them.
points = [(0,0), (19, 8), (23, 11), (25, 22), (30, 26), (23, 20), (30,0)]
def kClosest(target, points,k):
return sorted(range(len(points)), key=lambda idx:(target[0]-points[idx][0])**2+(target[1]-points[idx][1])**2)[:k]
for i in points:
points_copy = points.copy()
points_copy.remove(i)
print(i, ": ", kClosest(i,points_copy,3))
This is a rather primitive method of getting what you need, but doesn’t involve using sklearn: before calling kClosest() add the index of the point to its value, like this:
points_with_index = [l+[i] for i,l in enumerate(points)]
Use points_with_index in your call to kClosest() instead of points. Now when you print value from points_with_index use value[2] to show just the original index.
I have a list of X Y tuple coordinates. I am trying to eliminate the coordinates that are very close to each other using the euclidean distance. However, the code so far does not perform as expected, especially as the number of coordinates increases.
So far, I have found online how to compare two lists of coordinates, but not the elements within the same list.
Hence, what i have done is slice the list in the first element and the remainder of the list and so the euclidean distance comparison. If within the proximity, it is element value is removed from the list. Then the list is updated and the procedure repeated. However, it does not perform as expected.
from scipy.spatial import distance
# List of coordinates.
xy = [(123, 2191), (44, 2700), (125, 2958), (41, 3368), (33, 4379), (78, 4434), (75, 5897), (50, 6220), (75, 7271), (80, 7274), (58, 8440), (60, 8440), (59, 8441), (32, 9699), (54, 9758), (58, 9759), (43, 10113), (64, 10252), (57, 12118), (61, 12120), (60, 14129), (61, 14129), (66, 15932), (68, 15933), (53, 17302), (57, 17304), (84, 20012), (84, 20013), (102, 20222), (49, 21257), (47, 21653), (56, 27042), (51, 28200), (48, 28201), (55, 28202), (65, 29366), (43, 29484), (67, 29808), (32, 30840), (31, 30842), (48, 36368), (48, 36369), (49, 36369), (21, 37518), (102, 37519)]
uni = []
for x in xy[:]:
for i, j in enumerate(xy):
if i == 0:
new_xy = j # New List comprising of first element of the list
remaining_xy = list(set(xy) - set(new_xy)) # rest of list converted into a separate list
for m in remaining_xy:
print(new_xy , m, distance.euclidean(new_xy , m))
if distance.euclidean(new_xy ,m) < 1000: # If distance less then threshold, remove.
remaining_xy.remove(m)
xy = remaining_xy #reset xy
remaining_xy = [] #reset remaining_xy
uni.append(new_xy) # append unique values.
print(len((uni)), uni)
However, for example, the output shows
..., (53, 17302), (57, 17304), ...
Which does not satisfy the threshold.
For me your code is actually working. Maybe just change your last print statement to:
print(len(set(uni)), set(uni))
These outputs seem right for me. All coordinates in the set(uni) are more than 1000 apart from each other.
I get the following:
23 {(68, 15933), (58, 8440), (75, 7271), (51, 28200), (21, 37518), (61, 14129), (84, 20012), (65, 29366), (50, 6220), (49, 21257), (53, 17302), (41, 3368), (33, 4379), (64, 10252), (58, 9759), (56, 27042), (57, 12118), (78, 4434), (32, 30840), (31, 30842), (48, 36369), (48, 28201), (123, 2191)}
Update:
Unfortunately I haven't tested the complete output... I cannot directly find the issue in your code, but with a recursive function you will get the correct result you are looking for:
def recursiveCoord(_coordinateList):
if len(_coordinateList) > 1:
xy_0 = _coordinateList[0]
remaining_xy = list(set(_coordinateList) - set(xy_0))
new_xy_list = []
for coord in remaining_xy:
dist = distance.euclidean(xy_0 ,coord)
if dist >= 1000:
new_xy_list.append(coord)
return [xy_0] + recursiveCoord(new_xy_list)
else:
return []
Call it like that:
uni = recursiveCoord(xy)
and you will get a list with all unique coordinates.
I have the following code:
var_one = var_two[var_three-1]
var_one = "string_one" + var_1
And I need to do the following to it:
var_four = 'string_two', var_one
However, this returns the following error:
TypeError: Can't convert 'tuple' object to str implicity
I have tried things such as str(var_one) and using strip but these did not work.
What can I do to achieve the result I require?
EDIT - Here are what the variables contain:
var_one: new variable
var_two: tuple
var_three: integer
var_four: new
EDIT2:
The line in the program that makes the error is: os.system(var_four)
one_big_string = ''.join(tuple)
print one_big_string
What you've written is fine:
>>> x = 1
>>> y = 1, x
>>>
The problem is that somewhere else in your code, you're using var_four as a string where it should be a tuple.
BTW, I think it's neater to put parentheses around tuples like this; otherwise I tend to think they're being used in tuple unpacking.
EDIT: There are all sorts of ways to join and format strings -- Python is good at that. In somewhat-decreasing order of generality:
"{first_thing} {second_thing}".format(first_thing=var_one, second_thing=var_two)
"{0} {1}".format(var_one, var_two)
var_one + var_two
Your code looks fine as is.
Try running import pdb; pdb.set_trace() in your program to see if you can find the line triggering the error.
EDIT: You'll want to use ''.join(var_four) to convert var_four into a string before adding it to whatever it is you want to use it. Please note that this will actually create a new string and not overwrite var_four. See Python 3 string.join() equivalent?
Also, you should be using the subprocess module instead of os.system. See the Python 3.x documentation.
os.system expects a string which will will execute in the shell, but you're giving it a tuple instead.
Imagine we want to run the command rm -rf /home/mike. You might be doing something like
binary_and_option = 'rm -rf'
directory = '/home/mike'
command = binary_and_option, directory # This is the tuple
# ('rm -rf', '/home/mike')
# it is NOT the string
# 'rm -rf /home/mike'
os.system(command) # this clearly won't work, since it's just
# os.system(('rm -rf', '/home/mike'))
what you want to do instead is
command = "%d %d" % (binary_and_option, directory)
to assemble the string. You are probably thinking comma assembles str-ed objects together with spaces in between, but that's only for print; it's not how strings work in general.
But wait, there's more! You never want to use os.system, especially when you're going to build commands. It invokes the shell (which introduces unncessary security risks and other penalties) and has an inflexible API. Instead, use the subprocess module.
import subprocess
binary_and_option = ['rm', '-rf']
directory = '/home/mike'
command = binary_and_option + [directory]
subprocess.call(command)
str(my_tuple)
This seems too easy, but this works in Python 3.6
>>> x = list(range(100))
>>> y = list(range(500, 600))
>>> zip_obj = zip(x, y)
>>> my_tuple = tuple(zip_obj)
>>> type(my_tuple)
>>> <class 'tuple'>
>>> tuple_str = str(my_tuple)
>>> tuple_str
'((0, 500), (1, 501), (2, 502), (3, 503), (4, 504), (5, 505), (6, 506), (7, 507), (8, 508), (9, 509), (10, 510), (11, 511), (12, 512), (13, 513), (14, 514), (15, 515), (16, 516), (17, 517), (18, 518), (19, 519), (20, 520), (21, 521), (22, 522), (23, 523), (24, 524), (25, 525), (26, 526), (27, 527), (28, 528), (29, 529), (30, 530), (31, 531), (32, 532), (33, 533), (34, 534), (35, 535), (36, 536), (37, 537), (38, 538), (39, 539), (40, 540), (41, 541), (42, 542), (43, 543), (44, 544), (45, 545), (46, 546), (47, 547), (48, 548), (49, 549), (50, 550), (51, 551), (52, 552), (53, 553), (54, 554), (55, 555), (56, 556), (57, 557), (58, 558), (59, 559), (60, 560), (61, 561), (62, 562), (63, 563), (64, 564), (65, 565), (66, 566), (67, 567), (68, 568), (69, 569), (70, 570), (71, 571), (72, 572), (73, 573), (74, 574), (75, 575), (76, 576), (77, 577), (78, 578), (79, 579), (80, 580), (81, 581), (82, 582), (83, 583), (84, 584), (85, 585), (86, 586), (87, 587), (88, 588), (89, 589), (90, 590), (91, 591), (92, 592), (93, 593), (94, 594), (95, 595), (96, 596), (97, 597), (98, 598), (99, 599))'
>>>