Closest pair of point by brute force - python

I do not know what is wrong with my code. I generate 100 random points and I want to find the closest pair of these points, but the result is wrong.
#Closest pair
from math import sqrt
from random import randint
arr1=[]
dist=0
p1=[]
p2=[]
min1=1000
for i in range(0, 100):
arr1.append([randint(0,100),randint(0,100)])
print(arr1)
print("\n")
def dist(a,b):
x=pow((a[0]-b[0]),2)
y=pow((a[1]-b[1]),2)
return sqrt(x+y)
for i in range(0, len(arr1)):
for j in range(i+1, len(arr1)):
dis=dist(arr1[i],arr1[j])
if(dis<min1):
min1=dis
p1=arr1[i]
p2=arr1[j+1]
print(p1,"",p2,min1)
#print (sorted(arr1))

Okey you assume that (1, 5) and (5, 1) is the same point which is correct. However while you are looping from i+1 to 100 you adding arr1[j+1] I guess this is wrong consider when j=100 and you get the closest point then you will end up with arr1[101]

As İhsan Cemil Çiçek mentions, the main problem with your code is that you have p2=arr1[j+1], which should be p2=arr1[j].
However, there are a couple of things you can do to make this code more efficient.
There's no need to take the square root for every distance test. For non-negative d1 and d2, if sqrt(d1) < sqrt(d2) then d1 < d2, so we can just test the squared distances, and we only need to do a single expensive square root calculation when we've found the minimum.
Python has an efficient min function, so there's no need to find the minimum manually. Normally, min does a simple comparison of the values you pass it, but you can also supply it with a key function which it will use to make the comparisons.
You can use the combinations function from the standard itertools module to produce pairs of items from your points list with a single loop. This doesn't save much time, but it's cleaner than having a double loop.
Also, it's a good idea to supply a seed value to the random number generator when developing code that produces random values. This makes it easier to test & debug your code because it makes the results reproducible.
In the code below I've increased the range of the coordinates, because with 100 points with coordinates in the range 0 to 100 there's a high chance of generating duplicate points. You might like to use a set instead of a list if you don't want duplicate points.
from math import sqrt
from random import seed, randint
from itertools import combinations
seed(17)
high = 1000
numpoints = 100
points = [(randint(0, high), randint(0, high)) for _ in range(numpoints)]
points.sort()
print(points, '\n')
def dist(t):
a, b = t
x = a[0] - b[0]
y = a[1] - b[1]
return x*x + y*y
t = min(combinations(points, 2), key=dist)
a, b = t
print('{} {}: {}'.format(a, b, sqrt(dist(t))))
output
[(9, 51), (18, 443), (19, 478), (21, 635), (27, 254), (50, 165), (52, 918), (55, 746), (70, 316), (95, 707), (112, 939), (113, 929), (126, 903), (132, 256), (143, 832), (145, 698), (154, 692), (187, 200), (197, 765), (201, 154), (203, 317), (217, 51), (244, 119), (257, 983), (258, 880), (264, 76), (273, 65), (279, 343), (296, 178), (325, 655), (326, 174), (338, 552), (340, 96), (363, 51), (368, 59), (381, 585), (383, 593), (393, 834), (411, 140), (412, 496), (419, 83), (485, 648), (491, 76), (513, 821), (519, 962), (534, 424), (539, 980), (545, 572), (549, 312), (555, 87), (564, 63), (566, 923), (568, 545), (570, 218), (577, 537), (592, 801), (618, 848), (655, 614), (673, 413), (674, 314), (677, 284), (702, 141), (702, 215), (721, 553), (732, 654), (749, 974), (762, 279), (764, 429), (766, 732), (770, 756), (771, 356), (784, 722), (789, 319), (792, 5), (805, 282), (810, 896), (821, 978), (824, 911), (826, 310), (830, 323), (831, 418), (832, 518), (836, 400), (859, 256), (862, 996), (866, 700), (879, 485), (888, 415), (903, 722), (930, 588), (931, 496), (938, 356), (942, 323), (942, 344), (948, 429), (967, 741), (980, 254), (982, 488), (982, 604), (983, 374)]
(381, 585) (383, 593): 8.246211251235321

It will only work for first point, for all other points in list you are just checking the remaining points from (i+1 to n) not all points.(closest may also be in 0 to i)

You should use enumerate in the for loop, right now you are checking the i pair with all the pairs that appears after it in the array, what about the pairs before him?
also, you need to save the first and second pair that meet the condition of the distance as the i and j pair, why pair[j+1]?
Try this, I think it should work:
from math import sqrt
from random import randint
arr1=[]
dist=0
p1=[]
p2=[]
min1=1000
for i in range(0, 100):
arr1.append([randint(0,100),randint(0,100)])
print(arr1)
print("\n")
def dist(a,b):
x=pow((a[0]-b[0]),2)
y=pow((a[1]-b[1]),2)
return sqrt(x+y)
for i,x in enumerate (arr1):
for j,y in enumerate (arr1):
if (x != y):
dis=dist(arr1[i],arr1[j])
if(dis<min1):
min1=dis
p1=arr1[i]
p2=arr1[j]
print(p1,"",p2,min1)
print (sorted(arr1))

Related

Using Python Pandas, why is my vectorized gradient descent so much slower than my gradient descent using loops?

Here is my vectorized version:
import pandas as pd
import numpy as np
alpha=.001
data = [ (2, 3), (4, 7), (6, 11), (8, 17), (10, 23), (12, 31), (14, 39), (16, 49), (18, 59), (20, 71), (22, 83), (24, 97), (26, 113), (28, 131), (30, 149), (32, 169), (34, 191), (36, 214), (38, 239), (40, 266), (42, 295), (44, 326), (46, 359), (48, 394), (50, 431)]
##CREATE EXAMPLES MATRIX
x_coordinates = [x[0] for x in data]
x_coords=[]
[x_coords.append([1,x]) for x in x_coordinates]
#Creates a list of all x-coordinates with a 1 column
examples=pd.DataFrame(x_coords).transpose()
#uses that list to create a dataframe. Must transpose so it is dimsion 2,25. rows are features, columns are specific examples.
##CREATE THETA MATRIX/VECTOR
theta_list = [1, 2]
theta = pd.DataFrame(theta_list).transpose()
#creates a df of dimension 1,2.
##CREATE Y VECTOR/MATRIX
y_coordinates = [x[1] for x in data]
y=pd.DataFrame(y_coordinates).transpose()
deriv=pd.DataFrame([])
count=0
while (deriv != 0).all().all() and count <= 500000:
length=len(data)
#theta*X
thetaX=theta.dot(examples)
error=thetaX-y
error_pt2=error.dot(examples.T)
deriv=alpha*(1/length)*error_pt2
theta=theta-deriv
print(theta)
count+=1
print(count)
Here is my version made with loops:
total=0
th0=0
th1=0
alpha=0.001
deriv0=1
deriv1=1
count=0
while deriv0 and deriv1 != 0 and count<=1000000:
total0=0
total1=0
#th0
for i in data:
hyp=th0+(th1*i[0])
#print("Hyp is {}".format(hyp))
total0+=(hyp-i[1])
deriv0=(1/25)*total0
th0temp=th0-(alpha*(deriv0))
#th1
for i in data:
hyp=th0+(th1*i[0])
total1+=(hyp-i[1])*i[0]
deriv1=(1/25)*total1
th1temp=th1-(alpha*(deriv1))
th0=th0temp
th1=th1temp
th0temp=0
th1temp=0
count+=1
print("Theta 0: {} \n Theta 1: {} \n\n".format(th0,th1))
print(count)
When I run the vectorized version, it takes almost 10 times as long to run. I would have expected using vectors would make this code much more efficient than running multiple loops. What gives? Is it just the computational overhead of Pandas which is making this run slower? Maybe Pandas isn't suited for this sort of algorithm

Python: extending array keeps square bracket

I am trying to extend a list to add additional values but in the results it keeps displaying the end of the previous list.
def landmarksPoint():
landmarkPoints = []
# Check for range of landmarks (0 to 23) within the image, if all are displayed then continue to save the file.
for n in range(pointNumber):
# Split each line and column to save to text file and save to landmarkPoints Array.
x = landmarks.part(n).x
y = landmarks.part(n).y
# Print each line for testing and append it to array.
print("x:", x, " y:", y)
landmarkPoints.append((x, y))
return landmarkPoints
for hand in hands:
landmarks = predictor(imageGray1, composite1)
points1.append(landmarksPoint())
print(points1)
boundaryLoc = (1,1), (700,1), (1590, 1), (1590,500), (1590, 1190), (700, 1190), (1, 1190), (1,500)
points1.extend(boundaryLoc)
print(points1)
OUTPUT:
[[(992, 191), (1178, 337), (895, 702), (859, 873), (831, 991), (836, 514), (794, 627), (762, 768), (744, 900), (770, 396), (728, 479), (705, 586), (1213, 458), (690, 703), (773, 229), (803, 140), (1228, 147), (1281, 543), (1082, 471), (1027, 576), (996, 712), (970, 841), (933, 966), (922, 563)], (1, 1), (700, 1), (1590, 1), (1590, 500), (1590, 1190), (700, 1190), (1, 1190), (1, 500)]
The docs say that list.extend() extends the calling object with the contents of an argument that is an iterable.
So, points1.extend(boundaryLoc) extends the list points1 using the contents of the tuple boundaryLoc (you can verify that boundaryLoc is a tuple of tuples by examining the result of type(boundaryLoc)).
This means that each tuple contained within boundaryLoc will in effect be appended to points1, which is exactly what your output shows.
If you want to append a list of tuples to points1, you can do this:
boundaryLoc = [(1,1), (700,1), (1590, 1), (1590,500), (1590, 1190), (700, 1190), (1, 1190), (1,500)]
points1.append(boundaryLoc)
Note that we have explicitly made boundaryLoc a list (not a tuple) of tuples, and we use append() instead of extend().
If you really wanted to use extend(), you could do this:
points1.extend([boundaryLoc])

Assigning Variables to Tuples in a List

I am currently using a function to return a list of tuples (coordinates). I need to assign these coordinates variables so I can use them in a for loop.
My function is:
new_connect = astar.get_path(n1x, n1y, n2x, n2y)
with print(new_connect) I get the output:
[(76, 51), (75, 51), (74, 51), (73, 51), (72, 51), (71, 51), (70, 51), (69, 51), ...]
I need to assign these tuples variables i.e. (x, y)
So they can be used in the following for loop:
for x in range(new_connect):
for y in range(new_connect):
self.tiles[x][y].blocked = False
self.tiles[x][y].block_sight = False
Which (should) plot the coordinates and change their tile values.
Any help is greatly appreciated. I've been stuck working on this and feel like I'm missing something super simple.
You can use unpacking
new_connect = [(76, 51), (75, 51), (74, 51), (73, 51), (72, 51), (71, 51), (70, 51), (69, 51)]
for x, y in new_connect:
print(x, y)
So, it isn't clear how range(new_connect) is actually working. It shouldn't. You should receive a TypeError, because a list object is not the correct argument to range.
That said, you should be able to create a for loop for a list of tuples by performing the tuple unpacking in the for statement itself.
for x, y in astar.get_path(...):
...

Comparing X Y coordinates of the same list

I have a list of X Y tuple coordinates. I am trying to eliminate the coordinates that are very close to each other using the euclidean distance. However, the code so far does not perform as expected, especially as the number of coordinates increases.
So far, I have found online how to compare two lists of coordinates, but not the elements within the same list.
Hence, what i have done is slice the list in the first element and the remainder of the list and so the euclidean distance comparison. If within the proximity, it is element value is removed from the list. Then the list is updated and the procedure repeated. However, it does not perform as expected.
from scipy.spatial import distance
# List of coordinates.
xy = [(123, 2191), (44, 2700), (125, 2958), (41, 3368), (33, 4379), (78, 4434), (75, 5897), (50, 6220), (75, 7271), (80, 7274), (58, 8440), (60, 8440), (59, 8441), (32, 9699), (54, 9758), (58, 9759), (43, 10113), (64, 10252), (57, 12118), (61, 12120), (60, 14129), (61, 14129), (66, 15932), (68, 15933), (53, 17302), (57, 17304), (84, 20012), (84, 20013), (102, 20222), (49, 21257), (47, 21653), (56, 27042), (51, 28200), (48, 28201), (55, 28202), (65, 29366), (43, 29484), (67, 29808), (32, 30840), (31, 30842), (48, 36368), (48, 36369), (49, 36369), (21, 37518), (102, 37519)]
uni = []
for x in xy[:]:
for i, j in enumerate(xy):
if i == 0:
new_xy = j # New List comprising of first element of the list
remaining_xy = list(set(xy) - set(new_xy)) # rest of list converted into a separate list
for m in remaining_xy:
print(new_xy , m, distance.euclidean(new_xy , m))
if distance.euclidean(new_xy ,m) < 1000: # If distance less then threshold, remove.
remaining_xy.remove(m)
xy = remaining_xy #reset xy
remaining_xy = [] #reset remaining_xy
uni.append(new_xy) # append unique values.
print(len((uni)), uni)
However, for example, the output shows
..., (53, 17302), (57, 17304), ...
Which does not satisfy the threshold.
For me your code is actually working. Maybe just change your last print statement to:
print(len(set(uni)), set(uni))
These outputs seem right for me. All coordinates in the set(uni) are more than 1000 apart from each other.
I get the following:
23 {(68, 15933), (58, 8440), (75, 7271), (51, 28200), (21, 37518), (61, 14129), (84, 20012), (65, 29366), (50, 6220), (49, 21257), (53, 17302), (41, 3368), (33, 4379), (64, 10252), (58, 9759), (56, 27042), (57, 12118), (78, 4434), (32, 30840), (31, 30842), (48, 36369), (48, 28201), (123, 2191)}
Update:
Unfortunately I haven't tested the complete output... I cannot directly find the issue in your code, but with a recursive function you will get the correct result you are looking for:
def recursiveCoord(_coordinateList):
if len(_coordinateList) > 1:
xy_0 = _coordinateList[0]
remaining_xy = list(set(_coordinateList) - set(xy_0))
new_xy_list = []
for coord in remaining_xy:
dist = distance.euclidean(xy_0 ,coord)
if dist >= 1000:
new_xy_list.append(coord)
return [xy_0] + recursiveCoord(new_xy_list)
else:
return []
Call it like that:
uni = recursiveCoord(xy)
and you will get a list with all unique coordinates.

How to generate a list with couples of random integers?

I am rather new to Python and NetworkX. I need to create a list similar to Edgelist=[(0,1),(0,3),(1,0),(1,2),(1,4),(2,1),(2,5)], which elements represent the starting and ending node of an edge (link) that is in turn part of a network.
Rather than setting them manually, I want Python to create the couples you see in the list by randomly selecting the integer values of (start,end) from an assigned range of values (namely, 0, 999), which represent the node IDs. Then, I want to make sure that every node ID is included at least once in the series of (start,end) values (this means that all my nodes will be connected to at least one other node).
I know I could use random.randint(0, 999) but I don't know how to "nest" it into the creation of a list (perhaps a for loop?). I wish I had some code to show you but this is my first attempt at working with NetworkX!
EDIT
To give you a visual idea of what I mean, here are two images. The first is a regular network (aka lattice), and the second is a random one. The edge list of the first was created manually in order to reproduce a chess table, while the second displays an edge list which is a (manually) shuffled counterpart of the first one. As you see, the nodes are kept in exactly the same locations. Hope this helps a bit more. Thanks!
There is a similar answer but for a complete graph on - How to generate a fully connected subgraph from node list using python's networkx module
In your case, using zubinmehta's answer:
import networkx
import itertools
def complete_graph_from_list(L, create_using=None):
G = networkx.empty_graph(len(L),create_using)
if len(L)>1:
if G.is_directed():
edges = itertools.permutations(L,2)
else:
edges = itertools.combinations(L,2)
G.add_edges_from(edges)
return G
You could build the graph as:
S = complete_graph_from_list(map(lambda x: str(x), range(0,1000)))
print S.edges()
Here is a networkx command that will create a graph such that each node has exactly one edge:
import networkx as nx
G = nx.configuration_model([1]*1000)
If you look into the guts of it, it does the following which answers your question - each node will appear in exactly one edge.
import random
mylist = random.suffle(range(start,end))
edgelist = []
while mylist:
edgelist.append((mylist.pop(),mylist.pop()))
You should guarantee that mylist has even length before going through the popping.
Python has inbuilt library called itertools.
Sample as below as how you achieve what you mentioned:
import itertools
list = [3, 4, 6, 7]
sublist_length = 2
comb = itertools.combinations(list, sublist_length)
This will return comb as an iterator.
You can do comb.next() to get next element in the iterator or iterate over a for loop to get all results as you wanted as below.
for item in comb:
print item
which should output:
(3, 4),
(3, 6),
(3, 7),
(4, 6),
(4, 7),
(6, 7),
I hope this will solve your problem.
For the list creation you can do something like:
import random
max = 999
min = 0
original_values = range(min, max) # could be arbitrary list
n_edges = # some number..
my_edge_list = [(random.choice(original_values), random.choice(original_values))
for _ in range(n_edges)]
To assert you have all values in there you can do the following
vals = set([v for tup in my_edge_list for v in tup])
assert all([v in vals for v in original_values])
The assert will make sure you have the proper representation in your edges. As far as doing your best to make sure you don't hit that assert you can do a couple of things.
Sample without replacement from your list of integers until they are all gone to create a "base network" and then randomly add on more to your hearts desire
Make n_edges sufficiently high that it's very likely your condition will be met. If it's not try again...
Really depends on what you're going to use the network for and what kind of structure you want it to have
EDIT: I have updated my response to be more robust to an arbitrary list of values rather than requiring a sequential list
random.seed(datetime.datetime.now())
from random import randint
# ot generate 100 tuples with randints in range 0-99
li = [(randint(0,99),randint(0,99)) for i in range(100)]
print(li)
[(80, 55), (3, 10), (66, 65), (26, 23), (8, 72), (83, 25), (24, 99), (72, 9), (52, 76), (72, 68), (67, 25), (72, 18), (94, 62), (7, 62), (49, 94), (29, 89), (11, 38), (52, 51), (19, 32), (20, 85), (56, 61), (4, 40), (97, 58), (82, 2), (50, 82), (77, 5), (2, 9), (2, 46), (39, 4), (74, 40), (69, 15), (1, 77), (45, 58), (80, 59), (85, 80), (27, 80), (81, 4), (22, 33), (77, 60), (75, 87), (43, 36), (60, 34), (90, 54), (75, 3), (89, 84), (51, 93), (62, 64), (81, 50), (15, 60), (33, 97), (42, 62), (83, 26), (13, 33), (41, 87), (29, 63), (4, 32), (6, 14), (79, 73), (95, 4), (41, 16), (96, 64), (15, 28), (35, 13), (35, 82), (77, 16), (63, 27), (75, 37), (11, 52), (21, 35), (37, 96), (9, 86), (83, 11), (5, 42), (34, 32), (17, 8), (65, 55), (58, 19), (90, 40), (18, 75), (29, 14), (0, 11), (25, 68), (34, 52), (22, 8), (12, 53), (16, 49), (73, 54), (78, 80), (74, 60), (40, 68), (69, 20), (37, 38), (74, 60), (53, 90), (25, 48), (44, 52), (49, 27), (28, 35), (29, 94), (35, 60)]
Here is a solution that first generates a random population of nodes (pop1), then shuffles it (pop2) and combines those into a list of pairs.
Note: this method only yields vertices where each node is exactly once start and exactly once end, so maybe not what you're after. See below for another method
import random, copy
random.seed() # defaults to time.time() ...
# extract a number of samples - the number of nodes you want
pop1 = random.sample(xrange(1000), 10)
pop2 = copy.deepcopy( pop1 )
random.shuffle( pop2 )
# generate pairs from the same population - this guarantees your constraint
pairs = zip( pop1, pop2 )
print pairs
Output:
[(17, 347), (812, 688), (347, 266), (731, 342), (342, 49), (904, 17), (49, 731), (50, 904), (688, 50), (266, 812)]
Here is another method
This allows for duplicate occurrences of the nodes.
The idea is to draw start and end nodes from the same population:
import random
random.seed()
population = range(10) # any population would do
# choose randomly from the population for both ends
# so you can have duplicates
pairs = [(random.choice(population), random.choice(population) for _ in xrange(100)]
print pairs[:10]
Output:
[(1, 9), (7, 1), (8, 6), (4, 7), (6, 2), (7, 3), (0, 2), (1, 0), (8, 3), (8, 3)]

Categories

Resources