Comparing X Y coordinates of the same list - python

I have a list of X Y tuple coordinates. I am trying to eliminate the coordinates that are very close to each other using the euclidean distance. However, the code so far does not perform as expected, especially as the number of coordinates increases.
So far, I have found online how to compare two lists of coordinates, but not the elements within the same list.
Hence, what i have done is slice the list in the first element and the remainder of the list and so the euclidean distance comparison. If within the proximity, it is element value is removed from the list. Then the list is updated and the procedure repeated. However, it does not perform as expected.
from scipy.spatial import distance
# List of coordinates.
xy = [(123, 2191), (44, 2700), (125, 2958), (41, 3368), (33, 4379), (78, 4434), (75, 5897), (50, 6220), (75, 7271), (80, 7274), (58, 8440), (60, 8440), (59, 8441), (32, 9699), (54, 9758), (58, 9759), (43, 10113), (64, 10252), (57, 12118), (61, 12120), (60, 14129), (61, 14129), (66, 15932), (68, 15933), (53, 17302), (57, 17304), (84, 20012), (84, 20013), (102, 20222), (49, 21257), (47, 21653), (56, 27042), (51, 28200), (48, 28201), (55, 28202), (65, 29366), (43, 29484), (67, 29808), (32, 30840), (31, 30842), (48, 36368), (48, 36369), (49, 36369), (21, 37518), (102, 37519)]
uni = []
for x in xy[:]:
for i, j in enumerate(xy):
if i == 0:
new_xy = j # New List comprising of first element of the list
remaining_xy = list(set(xy) - set(new_xy)) # rest of list converted into a separate list
for m in remaining_xy:
print(new_xy , m, distance.euclidean(new_xy , m))
if distance.euclidean(new_xy ,m) < 1000: # If distance less then threshold, remove.
remaining_xy.remove(m)
xy = remaining_xy #reset xy
remaining_xy = [] #reset remaining_xy
uni.append(new_xy) # append unique values.
print(len((uni)), uni)
However, for example, the output shows
..., (53, 17302), (57, 17304), ...
Which does not satisfy the threshold.

For me your code is actually working. Maybe just change your last print statement to:
print(len(set(uni)), set(uni))
These outputs seem right for me. All coordinates in the set(uni) are more than 1000 apart from each other.
I get the following:
23 {(68, 15933), (58, 8440), (75, 7271), (51, 28200), (21, 37518), (61, 14129), (84, 20012), (65, 29366), (50, 6220), (49, 21257), (53, 17302), (41, 3368), (33, 4379), (64, 10252), (58, 9759), (56, 27042), (57, 12118), (78, 4434), (32, 30840), (31, 30842), (48, 36369), (48, 28201), (123, 2191)}
Update:
Unfortunately I haven't tested the complete output... I cannot directly find the issue in your code, but with a recursive function you will get the correct result you are looking for:
def recursiveCoord(_coordinateList):
if len(_coordinateList) > 1:
xy_0 = _coordinateList[0]
remaining_xy = list(set(_coordinateList) - set(xy_0))
new_xy_list = []
for coord in remaining_xy:
dist = distance.euclidean(xy_0 ,coord)
if dist >= 1000:
new_xy_list.append(coord)
return [xy_0] + recursiveCoord(new_xy_list)
else:
return []
Call it like that:
uni = recursiveCoord(xy)
and you will get a list with all unique coordinates.

Related

Using Python Pandas, why is my vectorized gradient descent so much slower than my gradient descent using loops?

Here is my vectorized version:
import pandas as pd
import numpy as np
alpha=.001
data = [ (2, 3), (4, 7), (6, 11), (8, 17), (10, 23), (12, 31), (14, 39), (16, 49), (18, 59), (20, 71), (22, 83), (24, 97), (26, 113), (28, 131), (30, 149), (32, 169), (34, 191), (36, 214), (38, 239), (40, 266), (42, 295), (44, 326), (46, 359), (48, 394), (50, 431)]
##CREATE EXAMPLES MATRIX
x_coordinates = [x[0] for x in data]
x_coords=[]
[x_coords.append([1,x]) for x in x_coordinates]
#Creates a list of all x-coordinates with a 1 column
examples=pd.DataFrame(x_coords).transpose()
#uses that list to create a dataframe. Must transpose so it is dimsion 2,25. rows are features, columns are specific examples.
##CREATE THETA MATRIX/VECTOR
theta_list = [1, 2]
theta = pd.DataFrame(theta_list).transpose()
#creates a df of dimension 1,2.
##CREATE Y VECTOR/MATRIX
y_coordinates = [x[1] for x in data]
y=pd.DataFrame(y_coordinates).transpose()
deriv=pd.DataFrame([])
count=0
while (deriv != 0).all().all() and count <= 500000:
length=len(data)
#theta*X
thetaX=theta.dot(examples)
error=thetaX-y
error_pt2=error.dot(examples.T)
deriv=alpha*(1/length)*error_pt2
theta=theta-deriv
print(theta)
count+=1
print(count)
Here is my version made with loops:
total=0
th0=0
th1=0
alpha=0.001
deriv0=1
deriv1=1
count=0
while deriv0 and deriv1 != 0 and count<=1000000:
total0=0
total1=0
#th0
for i in data:
hyp=th0+(th1*i[0])
#print("Hyp is {}".format(hyp))
total0+=(hyp-i[1])
deriv0=(1/25)*total0
th0temp=th0-(alpha*(deriv0))
#th1
for i in data:
hyp=th0+(th1*i[0])
total1+=(hyp-i[1])*i[0]
deriv1=(1/25)*total1
th1temp=th1-(alpha*(deriv1))
th0=th0temp
th1=th1temp
th0temp=0
th1temp=0
count+=1
print("Theta 0: {} \n Theta 1: {} \n\n".format(th0,th1))
print(count)
When I run the vectorized version, it takes almost 10 times as long to run. I would have expected using vectors would make this code much more efficient than running multiple loops. What gives? Is it just the computational overhead of Pandas which is making this run slower? Maybe Pandas isn't suited for this sort of algorithm

Assigning Variables to Tuples in a List

I am currently using a function to return a list of tuples (coordinates). I need to assign these coordinates variables so I can use them in a for loop.
My function is:
new_connect = astar.get_path(n1x, n1y, n2x, n2y)
with print(new_connect) I get the output:
[(76, 51), (75, 51), (74, 51), (73, 51), (72, 51), (71, 51), (70, 51), (69, 51), ...]
I need to assign these tuples variables i.e. (x, y)
So they can be used in the following for loop:
for x in range(new_connect):
for y in range(new_connect):
self.tiles[x][y].blocked = False
self.tiles[x][y].block_sight = False
Which (should) plot the coordinates and change their tile values.
Any help is greatly appreciated. I've been stuck working on this and feel like I'm missing something super simple.
You can use unpacking
new_connect = [(76, 51), (75, 51), (74, 51), (73, 51), (72, 51), (71, 51), (70, 51), (69, 51)]
for x, y in new_connect:
print(x, y)
So, it isn't clear how range(new_connect) is actually working. It shouldn't. You should receive a TypeError, because a list object is not the correct argument to range.
That said, you should be able to create a for loop for a list of tuples by performing the tuple unpacking in the for statement itself.
for x, y in astar.get_path(...):
...

Closest pair of point by brute force

I do not know what is wrong with my code. I generate 100 random points and I want to find the closest pair of these points, but the result is wrong.
#Closest pair
from math import sqrt
from random import randint
arr1=[]
dist=0
p1=[]
p2=[]
min1=1000
for i in range(0, 100):
arr1.append([randint(0,100),randint(0,100)])
print(arr1)
print("\n")
def dist(a,b):
x=pow((a[0]-b[0]),2)
y=pow((a[1]-b[1]),2)
return sqrt(x+y)
for i in range(0, len(arr1)):
for j in range(i+1, len(arr1)):
dis=dist(arr1[i],arr1[j])
if(dis<min1):
min1=dis
p1=arr1[i]
p2=arr1[j+1]
print(p1,"",p2,min1)
#print (sorted(arr1))
Okey you assume that (1, 5) and (5, 1) is the same point which is correct. However while you are looping from i+1 to 100 you adding arr1[j+1] I guess this is wrong consider when j=100 and you get the closest point then you will end up with arr1[101]
As İhsan Cemil Çiçek mentions, the main problem with your code is that you have p2=arr1[j+1], which should be p2=arr1[j].
However, there are a couple of things you can do to make this code more efficient.
There's no need to take the square root for every distance test. For non-negative d1 and d2, if sqrt(d1) < sqrt(d2) then d1 < d2, so we can just test the squared distances, and we only need to do a single expensive square root calculation when we've found the minimum.
Python has an efficient min function, so there's no need to find the minimum manually. Normally, min does a simple comparison of the values you pass it, but you can also supply it with a key function which it will use to make the comparisons.
You can use the combinations function from the standard itertools module to produce pairs of items from your points list with a single loop. This doesn't save much time, but it's cleaner than having a double loop.
Also, it's a good idea to supply a seed value to the random number generator when developing code that produces random values. This makes it easier to test & debug your code because it makes the results reproducible.
In the code below I've increased the range of the coordinates, because with 100 points with coordinates in the range 0 to 100 there's a high chance of generating duplicate points. You might like to use a set instead of a list if you don't want duplicate points.
from math import sqrt
from random import seed, randint
from itertools import combinations
seed(17)
high = 1000
numpoints = 100
points = [(randint(0, high), randint(0, high)) for _ in range(numpoints)]
points.sort()
print(points, '\n')
def dist(t):
a, b = t
x = a[0] - b[0]
y = a[1] - b[1]
return x*x + y*y
t = min(combinations(points, 2), key=dist)
a, b = t
print('{} {}: {}'.format(a, b, sqrt(dist(t))))
output
[(9, 51), (18, 443), (19, 478), (21, 635), (27, 254), (50, 165), (52, 918), (55, 746), (70, 316), (95, 707), (112, 939), (113, 929), (126, 903), (132, 256), (143, 832), (145, 698), (154, 692), (187, 200), (197, 765), (201, 154), (203, 317), (217, 51), (244, 119), (257, 983), (258, 880), (264, 76), (273, 65), (279, 343), (296, 178), (325, 655), (326, 174), (338, 552), (340, 96), (363, 51), (368, 59), (381, 585), (383, 593), (393, 834), (411, 140), (412, 496), (419, 83), (485, 648), (491, 76), (513, 821), (519, 962), (534, 424), (539, 980), (545, 572), (549, 312), (555, 87), (564, 63), (566, 923), (568, 545), (570, 218), (577, 537), (592, 801), (618, 848), (655, 614), (673, 413), (674, 314), (677, 284), (702, 141), (702, 215), (721, 553), (732, 654), (749, 974), (762, 279), (764, 429), (766, 732), (770, 756), (771, 356), (784, 722), (789, 319), (792, 5), (805, 282), (810, 896), (821, 978), (824, 911), (826, 310), (830, 323), (831, 418), (832, 518), (836, 400), (859, 256), (862, 996), (866, 700), (879, 485), (888, 415), (903, 722), (930, 588), (931, 496), (938, 356), (942, 323), (942, 344), (948, 429), (967, 741), (980, 254), (982, 488), (982, 604), (983, 374)]
(381, 585) (383, 593): 8.246211251235321
It will only work for first point, for all other points in list you are just checking the remaining points from (i+1 to n) not all points.(closest may also be in 0 to i)
You should use enumerate in the for loop, right now you are checking the i pair with all the pairs that appears after it in the array, what about the pairs before him?
also, you need to save the first and second pair that meet the condition of the distance as the i and j pair, why pair[j+1]?
Try this, I think it should work:
from math import sqrt
from random import randint
arr1=[]
dist=0
p1=[]
p2=[]
min1=1000
for i in range(0, 100):
arr1.append([randint(0,100),randint(0,100)])
print(arr1)
print("\n")
def dist(a,b):
x=pow((a[0]-b[0]),2)
y=pow((a[1]-b[1]),2)
return sqrt(x+y)
for i,x in enumerate (arr1):
for j,y in enumerate (arr1):
if (x != y):
dis=dist(arr1[i],arr1[j])
if(dis<min1):
min1=dis
p1=arr1[i]
p2=arr1[j]
print(p1,"",p2,min1)
print (sorted(arr1))

How to generate a list with couples of random integers?

I am rather new to Python and NetworkX. I need to create a list similar to Edgelist=[(0,1),(0,3),(1,0),(1,2),(1,4),(2,1),(2,5)], which elements represent the starting and ending node of an edge (link) that is in turn part of a network.
Rather than setting them manually, I want Python to create the couples you see in the list by randomly selecting the integer values of (start,end) from an assigned range of values (namely, 0, 999), which represent the node IDs. Then, I want to make sure that every node ID is included at least once in the series of (start,end) values (this means that all my nodes will be connected to at least one other node).
I know I could use random.randint(0, 999) but I don't know how to "nest" it into the creation of a list (perhaps a for loop?). I wish I had some code to show you but this is my first attempt at working with NetworkX!
EDIT
To give you a visual idea of what I mean, here are two images. The first is a regular network (aka lattice), and the second is a random one. The edge list of the first was created manually in order to reproduce a chess table, while the second displays an edge list which is a (manually) shuffled counterpart of the first one. As you see, the nodes are kept in exactly the same locations. Hope this helps a bit more. Thanks!
There is a similar answer but for a complete graph on - How to generate a fully connected subgraph from node list using python's networkx module
In your case, using zubinmehta's answer:
import networkx
import itertools
def complete_graph_from_list(L, create_using=None):
G = networkx.empty_graph(len(L),create_using)
if len(L)>1:
if G.is_directed():
edges = itertools.permutations(L,2)
else:
edges = itertools.combinations(L,2)
G.add_edges_from(edges)
return G
You could build the graph as:
S = complete_graph_from_list(map(lambda x: str(x), range(0,1000)))
print S.edges()
Here is a networkx command that will create a graph such that each node has exactly one edge:
import networkx as nx
G = nx.configuration_model([1]*1000)
If you look into the guts of it, it does the following which answers your question - each node will appear in exactly one edge.
import random
mylist = random.suffle(range(start,end))
edgelist = []
while mylist:
edgelist.append((mylist.pop(),mylist.pop()))
You should guarantee that mylist has even length before going through the popping.
Python has inbuilt library called itertools.
Sample as below as how you achieve what you mentioned:
import itertools
list = [3, 4, 6, 7]
sublist_length = 2
comb = itertools.combinations(list, sublist_length)
This will return comb as an iterator.
You can do comb.next() to get next element in the iterator or iterate over a for loop to get all results as you wanted as below.
for item in comb:
print item
which should output:
(3, 4),
(3, 6),
(3, 7),
(4, 6),
(4, 7),
(6, 7),
I hope this will solve your problem.
For the list creation you can do something like:
import random
max = 999
min = 0
original_values = range(min, max) # could be arbitrary list
n_edges = # some number..
my_edge_list = [(random.choice(original_values), random.choice(original_values))
for _ in range(n_edges)]
To assert you have all values in there you can do the following
vals = set([v for tup in my_edge_list for v in tup])
assert all([v in vals for v in original_values])
The assert will make sure you have the proper representation in your edges. As far as doing your best to make sure you don't hit that assert you can do a couple of things.
Sample without replacement from your list of integers until they are all gone to create a "base network" and then randomly add on more to your hearts desire
Make n_edges sufficiently high that it's very likely your condition will be met. If it's not try again...
Really depends on what you're going to use the network for and what kind of structure you want it to have
EDIT: I have updated my response to be more robust to an arbitrary list of values rather than requiring a sequential list
random.seed(datetime.datetime.now())
from random import randint
# ot generate 100 tuples with randints in range 0-99
li = [(randint(0,99),randint(0,99)) for i in range(100)]
print(li)
[(80, 55), (3, 10), (66, 65), (26, 23), (8, 72), (83, 25), (24, 99), (72, 9), (52, 76), (72, 68), (67, 25), (72, 18), (94, 62), (7, 62), (49, 94), (29, 89), (11, 38), (52, 51), (19, 32), (20, 85), (56, 61), (4, 40), (97, 58), (82, 2), (50, 82), (77, 5), (2, 9), (2, 46), (39, 4), (74, 40), (69, 15), (1, 77), (45, 58), (80, 59), (85, 80), (27, 80), (81, 4), (22, 33), (77, 60), (75, 87), (43, 36), (60, 34), (90, 54), (75, 3), (89, 84), (51, 93), (62, 64), (81, 50), (15, 60), (33, 97), (42, 62), (83, 26), (13, 33), (41, 87), (29, 63), (4, 32), (6, 14), (79, 73), (95, 4), (41, 16), (96, 64), (15, 28), (35, 13), (35, 82), (77, 16), (63, 27), (75, 37), (11, 52), (21, 35), (37, 96), (9, 86), (83, 11), (5, 42), (34, 32), (17, 8), (65, 55), (58, 19), (90, 40), (18, 75), (29, 14), (0, 11), (25, 68), (34, 52), (22, 8), (12, 53), (16, 49), (73, 54), (78, 80), (74, 60), (40, 68), (69, 20), (37, 38), (74, 60), (53, 90), (25, 48), (44, 52), (49, 27), (28, 35), (29, 94), (35, 60)]
Here is a solution that first generates a random population of nodes (pop1), then shuffles it (pop2) and combines those into a list of pairs.
Note: this method only yields vertices where each node is exactly once start and exactly once end, so maybe not what you're after. See below for another method
import random, copy
random.seed() # defaults to time.time() ...
# extract a number of samples - the number of nodes you want
pop1 = random.sample(xrange(1000), 10)
pop2 = copy.deepcopy( pop1 )
random.shuffle( pop2 )
# generate pairs from the same population - this guarantees your constraint
pairs = zip( pop1, pop2 )
print pairs
Output:
[(17, 347), (812, 688), (347, 266), (731, 342), (342, 49), (904, 17), (49, 731), (50, 904), (688, 50), (266, 812)]
Here is another method
This allows for duplicate occurrences of the nodes.
The idea is to draw start and end nodes from the same population:
import random
random.seed()
population = range(10) # any population would do
# choose randomly from the population for both ends
# so you can have duplicates
pairs = [(random.choice(population), random.choice(population) for _ in xrange(100)]
print pairs[:10]
Output:
[(1, 9), (7, 1), (8, 6), (4, 7), (6, 2), (7, 3), (0, 2), (1, 0), (8, 3), (8, 3)]

Python 3: Converting A Tuple To A String

I have the following code:
var_one = var_two[var_three-1]
var_one = "string_one" + var_1
And I need to do the following to it:
var_four = 'string_two', var_one
However, this returns the following error:
TypeError: Can't convert 'tuple' object to str implicity
I have tried things such as str(var_one) and using strip but these did not work.
What can I do to achieve the result I require?
EDIT - Here are what the variables contain:
var_one: new variable
var_two: tuple
var_three: integer
var_four: new
EDIT2:
The line in the program that makes the error is: os.system(var_four)
one_big_string = ''.join(tuple)
print one_big_string
What you've written is fine:
>>> x = 1
>>> y = 1, x
>>>
The problem is that somewhere else in your code, you're using var_four as a string where it should be a tuple.
BTW, I think it's neater to put parentheses around tuples like this; otherwise I tend to think they're being used in tuple unpacking.
EDIT: There are all sorts of ways to join and format strings -- Python is good at that. In somewhat-decreasing order of generality:
"{first_thing} {second_thing}".format(first_thing=var_one, second_thing=var_two)
"{0} {1}".format(var_one, var_two)
var_one + var_two
Your code looks fine as is.
Try running import pdb; pdb.set_trace() in your program to see if you can find the line triggering the error.
EDIT: You'll want to use ''.join(var_four) to convert var_four into a string before adding it to whatever it is you want to use it. Please note that this will actually create a new string and not overwrite var_four. See Python 3 string.join() equivalent?
Also, you should be using the subprocess module instead of os.system. See the Python 3.x documentation.
os.system expects a string which will will execute in the shell, but you're giving it a tuple instead.
Imagine we want to run the command rm -rf /home/mike. You might be doing something like
binary_and_option = 'rm -rf'
directory = '/home/mike'
command = binary_and_option, directory # This is the tuple
# ('rm -rf', '/home/mike')
# it is NOT the string
# 'rm -rf /home/mike'
os.system(command) # this clearly won't work, since it's just
# os.system(('rm -rf', '/home/mike'))
what you want to do instead is
command = "%d %d" % (binary_and_option, directory)
to assemble the string. You are probably thinking comma assembles str-ed objects together with spaces in between, but that's only for print; it's not how strings work in general.
But wait, there's more! You never want to use os.system, especially when you're going to build commands. It invokes the shell (which introduces unncessary security risks and other penalties) and has an inflexible API. Instead, use the subprocess module.
import subprocess
binary_and_option = ['rm', '-rf']
directory = '/home/mike'
command = binary_and_option + [directory]
subprocess.call(command)
str(my_tuple)
This seems too easy, but this works in Python 3.6
>>> x = list(range(100))
>>> y = list(range(500, 600))
>>> zip_obj = zip(x, y)
>>> my_tuple = tuple(zip_obj)
>>> type(my_tuple)
>>> <class 'tuple'>
>>> tuple_str = str(my_tuple)
>>> tuple_str
'((0, 500), (1, 501), (2, 502), (3, 503), (4, 504), (5, 505), (6, 506), (7, 507), (8, 508), (9, 509), (10, 510), (11, 511), (12, 512), (13, 513), (14, 514), (15, 515), (16, 516), (17, 517), (18, 518), (19, 519), (20, 520), (21, 521), (22, 522), (23, 523), (24, 524), (25, 525), (26, 526), (27, 527), (28, 528), (29, 529), (30, 530), (31, 531), (32, 532), (33, 533), (34, 534), (35, 535), (36, 536), (37, 537), (38, 538), (39, 539), (40, 540), (41, 541), (42, 542), (43, 543), (44, 544), (45, 545), (46, 546), (47, 547), (48, 548), (49, 549), (50, 550), (51, 551), (52, 552), (53, 553), (54, 554), (55, 555), (56, 556), (57, 557), (58, 558), (59, 559), (60, 560), (61, 561), (62, 562), (63, 563), (64, 564), (65, 565), (66, 566), (67, 567), (68, 568), (69, 569), (70, 570), (71, 571), (72, 572), (73, 573), (74, 574), (75, 575), (76, 576), (77, 577), (78, 578), (79, 579), (80, 580), (81, 581), (82, 582), (83, 583), (84, 584), (85, 585), (86, 586), (87, 587), (88, 588), (89, 589), (90, 590), (91, 591), (92, 592), (93, 593), (94, 594), (95, 595), (96, 596), (97, 597), (98, 598), (99, 599))'
>>>

Categories

Resources