k-closest point to a specific target

k-closest point to a specific target - python

I try to find a k closest points to a target. For example, I have this set of point:
points = [(0,0), (19, 8), (23, 11), (25, 22), (30, 26), (23, 20), (30,0)]
And I want to find the k-closest point for each element of the list. I have tried this code:
def kClosest(target, points,k):
return sorted(points, key=lambda x:(target[0]-x[0])**2+(target[1]-x[1])**2)[:k]
for i in points:
points_copy = points.copy()
points_copy.remove(i)
print(i, ": ", kClosest(i,points_copy,3))
The output will be like this (which I expect):
(0, 0) : [(19, 8), (23, 11), (30, 0)]
(19, 8) : [(23, 11), (23, 20), (30, 0)]
(23, 11) : [(19, 8), (23, 20), (25, 22)]
(25, 22) : [(23, 20), (30, 26), (23, 11)]
(30, 26) : [(25, 22), (23, 20), (23, 11)]
(23, 20) : [(25, 22), (23, 11), (30, 26)]
(30, 0) : [(23, 11), (19, 8), (23, 20)]
Now I want to modify the output to result in the index of the points. So, I hope the output for point (0,0) is [1, 2, 6]. How can we modify the code to produce the expected output?

Take a look at the Neighbors module from sklearn:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.neighbors
This will provide you optimized methods to do this.

I also think that you should use some optimized module like the one from Sklearn as stated in #aurelien_morel answer.
In any case a very simple solution for what you are asking is to work with indexes of the list points and use them to get and order by the values and return them.
points = [(0,0), (19, 8), (23, 11), (25, 22), (30, 26), (23, 20), (30,0)]
def kClosest(target, points,k):
return sorted(range(len(points)), key=lambda idx:(target[0]-points[idx][0])**2+(target[1]-points[idx][1])**2)[:k]
for i in points:
points_copy = points.copy()
points_copy.remove(i)
print(i, ": ", kClosest(i,points_copy,3))

This is a rather primitive method of getting what you need, but doesn’t involve using sklearn: before calling kClosest() add the index of the point to its value, like this:
points_with_index = [l+[i] for i,l in enumerate(points)]
Use points_with_index in your call to kClosest() instead of points. Now when you print value from points_with_index use value[2] to show just the original index.

Related

Using Python Pandas, why is my vectorized gradient descent so much slower than my gradient descent using loops?

Here is my vectorized version:
import pandas as pd
import numpy as np
alpha=.001
data = [ (2, 3), (4, 7), (6, 11), (8, 17), (10, 23), (12, 31), (14, 39), (16, 49), (18, 59), (20, 71), (22, 83), (24, 97), (26, 113), (28, 131), (30, 149), (32, 169), (34, 191), (36, 214), (38, 239), (40, 266), (42, 295), (44, 326), (46, 359), (48, 394), (50, 431)]
##CREATE EXAMPLES MATRIX
x_coordinates = [x[0] for x in data]
x_coords=[]
[x_coords.append([1,x]) for x in x_coordinates]
#Creates a list of all x-coordinates with a 1 column
examples=pd.DataFrame(x_coords).transpose()
#uses that list to create a dataframe. Must transpose so it is dimsion 2,25. rows are features, columns are specific examples.
##CREATE THETA MATRIX/VECTOR
theta_list = [1, 2]
theta = pd.DataFrame(theta_list).transpose()
#creates a df of dimension 1,2.
##CREATE Y VECTOR/MATRIX
y_coordinates = [x[1] for x in data]
y=pd.DataFrame(y_coordinates).transpose()
deriv=pd.DataFrame([])
count=0
while (deriv != 0).all().all() and count <= 500000:
length=len(data)
#theta*X
thetaX=theta.dot(examples)
error=thetaX-y
error_pt2=error.dot(examples.T)
deriv=alpha*(1/length)*error_pt2
theta=theta-deriv
print(theta)
count+=1
print(count)
Here is my version made with loops:
total=0
th0=0
th1=0
alpha=0.001
deriv0=1
deriv1=1
count=0
while deriv0 and deriv1 != 0 and count<=1000000:
total0=0
total1=0
#th0
for i in data:
hyp=th0+(th1*i[0])
#print("Hyp is {}".format(hyp))
total0+=(hyp-i[1])
deriv0=(1/25)*total0
th0temp=th0-(alpha*(deriv0))
#th1
for i in data:
hyp=th0+(th1*i[0])
total1+=(hyp-i[1])*i[0]
deriv1=(1/25)*total1
th1temp=th1-(alpha*(deriv1))
th0=th0temp
th1=th1temp
th0temp=0
th1temp=0
count+=1
print("Theta 0: {} \n Theta 1: {} \n\n".format(th0,th1))
print(count)
When I run the vectorized version, it takes almost 10 times as long to run. I would have expected using vectors would make this code much more efficient than running multiple loops. What gives? Is it just the computational overhead of Pandas which is making this run slower? Maybe Pandas isn't suited for this sort of algorithm

Remove overlapping spans from list of tuple values based on the length of spans in python

I have a list of tuples with character spans. But there are instances where there is an overlap of the spans.
My aim is to modify the tuple list in such a way that for overlaps only the larger span values is kept and smaller span deleted
Example:
Original list: [(2, 3), (7, 9), (10, 11), (10, 12), (15, 17), (16, 17), (20, 21), (20, 29), (21, 28)]
Modified list: [(2, 3), (7, 9), (10, 12), (15, 17), (20, 29)]
Here (10,11) , (16,17) , (20,21) , (21,28) were removed as they had a bigger span overlap with (10,12) , (15,17) and (20,29) respectively.
I found some answers which deal with the overlap like this but these don't deal with the larger span thing.
My thought was to sort on length of the span difference in descending order and then search for overlaps somehow. This search for overlap is something I cannot figure out

Code
values = [(2, 3), (7, 9), (10, 11), (10, 12), (15, 17), (16, 17), (20, 21), (20, 29), (21, 28)]
outputs = []
for value in values:
flag = True
outputsLoop = outputs[:]
for output in outputsLoop:
fromVal, toVal = output
if value[0] in range(fromVal,toVal+1) or value[1] in range(fromVal,toVal+1):
if (value[1]-value[0]) > (toVal - fromVal):
outputs.remove(output)
else:
flag = False
if flag == True:
outputs.append(value)
print(outputs)
Output
[(2, 3), (7, 9), (10, 12), (15, 17), (20, 29)]
Code explanation
The code loops through the values, and checks them for each part in the checked numbers, and keeps the value with the bigger span. It is not the most efficient or the cleanist code if I had to guess, but it works. And I do believe I can probably optimize it further when I have time.

Backtracking. Begin iteration again from zero in python

I'm implementing http://ahamnett.blogspot.com/2012/10/funnel-algorithm.html in python.
But the question is about iteration process. How do I start iteration with i = 0 and new lists?
The idea is to iterate through two lists rights and lefts one at a time. Then to measure angle between two points and compare it with the previously calculated angle. If at some time calculated angle becomes larger (funnel widens) I want to begin iteration process again with i = 0 and new rights and lefts lists.
Currently I'm not getting any output. I think that the problem is with the statement calcu() where I want def calcu() to start over again.
import math
def getAngle(a, b, c):
ang = math.degrees(math.atan2(c[1]-b[1], c[0]-b[0]) - math.atan2(a[1]-b[1], a[0]-b[0]))
return ang
triangles = [
[(7, 10), (5, 15), (0, 0)],
[(5, 15), (7, 10), (8, 15)],
[(12, 10), (8, 15), (7, 10)],
[(16, 12), (8, 15), (12, 10)],
[(16, 12), (16, 19), (8, 15)],
[(8, 24), (8, 15), (16, 19)],
[(17, 25), (8, 24), (16, 19)],
[(19, 19), (17, 25), (16, 19)],
[(19, 19), (40, 19), (17, 25)],
[(17, 25), (40, 19), (42, 25)]
]
start = (5, 12)
goal = (33, 22)
def calc(path):
rights = [
(5.0, 15.0), (8.0, 15.0), (8.0, 15.0),
(8.0, 15.0), (8.0, 15.0), (8.0, 24.0),
(17.0, 25.0), (17.0, 25.0), (17.0, 25.0)
]
lefts = [
(7.0, 10.0), (7.0, 10.0), (12.0, 10.0),
(16.0, 12.0), (16.0, 19.0), (16.0, 19.0),
(16.0, 19.0), (19.0, 19.0), (40.0, 19.0)
]
rights.append(goal)
lefts.append(goal)
rig = rights[0]
lef = lefts[0]
node = start
ang = getAngle(lef, node, rig)
nodes_list = [node]
def calcu():
nonlocal rights
nonlocal lefts
while nodes_list[-1] != goal:
for i in range(0, len(rights)-1):
rig = rights[i+1]
nonlocal lef
nonlocal ang
lef = lef
ang2 = getAngle(lef, nodes_list[-1], rig)
if ang2 <= ang:
rig = rig
lef = lefts[i+1]
ang = getAngle(lef, nodes_list[-1], rig)
if ang > ang2:
nodes_list.append(lefts[i])
rights = rights[i+1:]
lefts = lefts[i+1:]
i = 0
calcu()
else:
continue
return nodes_list
else:
nodes_list.append(rights[i])
rights = rights[i+1:]
lefts = lefts[i+1:]
i = 0
calcu()
return nodes_list
abc = calc(triangles)
print(abc)
In the code above I want to get the output ((5, 12), (8, 15), (16, 19), (33, 22)).

A couple things to look at. Your code structure is kind of a mess, but that can be fixed! First, you aren't getting any output because the function calc has no return statement. You also have an inner function calcu that is never called.
Start over and avoid any inner functions and global/nonlocal stuff... not needed. I would start with a function that takes an origin and a list of lefts & list of rights and see if you can get it to calculate the individual angles. That will give you the basic building block for another function to consume those outputs and make your path based on those returns.
Something like this in pseudocode:
lefts = [ ... ]
rights = [ ... ]
origin = (x0, y0)
def find_funnel(origin, lefts, rights):
# use the origin and walk through ALL the lefts and rights
# to get all the angles as proof-of-concept
loop through all the lefts/rights:
print (left, right, angle) # <-- use this for troubleshooting, then remove
return (last_left, last_right)
Get that running such that it walks through all of your lefts and rights and spits out believable angles, then you can alter that function for the stopping conditions appropriate for the funnel algorithm such that when it runs, it stops appropriately and returns the "last left and last right" of a valid funnel for the origin you sent it.

How to generate a list with couples of random integers?

I am rather new to Python and NetworkX. I need to create a list similar to Edgelist=[(0,1),(0,3),(1,0),(1,2),(1,4),(2,1),(2,5)], which elements represent the starting and ending node of an edge (link) that is in turn part of a network.
Rather than setting them manually, I want Python to create the couples you see in the list by randomly selecting the integer values of (start,end) from an assigned range of values (namely, 0, 999), which represent the node IDs. Then, I want to make sure that every node ID is included at least once in the series of (start,end) values (this means that all my nodes will be connected to at least one other node).
I know I could use random.randint(0, 999) but I don't know how to "nest" it into the creation of a list (perhaps a for loop?). I wish I had some code to show you but this is my first attempt at working with NetworkX!
EDIT
To give you a visual idea of what I mean, here are two images. The first is a regular network (aka lattice), and the second is a random one. The edge list of the first was created manually in order to reproduce a chess table, while the second displays an edge list which is a (manually) shuffled counterpart of the first one. As you see, the nodes are kept in exactly the same locations. Hope this helps a bit more. Thanks!

There is a similar answer but for a complete graph on - How to generate a fully connected subgraph from node list using python's networkx module
In your case, using zubinmehta's answer:
import networkx
import itertools
def complete_graph_from_list(L, create_using=None):
G = networkx.empty_graph(len(L),create_using)
if len(L)>1:
if G.is_directed():
edges = itertools.permutations(L,2)
else:
edges = itertools.combinations(L,2)
G.add_edges_from(edges)
return G
You could build the graph as:
S = complete_graph_from_list(map(lambda x: str(x), range(0,1000)))
print S.edges()

Here is a networkx command that will create a graph such that each node has exactly one edge:
import networkx as nx
G = nx.configuration_model([1]*1000)
If you look into the guts of it, it does the following which answers your question - each node will appear in exactly one edge.
import random
mylist = random.suffle(range(start,end))
edgelist = []
while mylist:
edgelist.append((mylist.pop(),mylist.pop()))
You should guarantee that mylist has even length before going through the popping.

Python has inbuilt library called itertools.
Sample as below as how you achieve what you mentioned:
import itertools
list = [3, 4, 6, 7]
sublist_length = 2
comb = itertools.combinations(list, sublist_length)
This will return comb as an iterator.
You can do comb.next() to get next element in the iterator or iterate over a for loop to get all results as you wanted as below.
for item in comb:
print item
which should output:
(3, 4),
(3, 6),
(3, 7),
(4, 6),
(4, 7),
(6, 7),
I hope this will solve your problem.

For the list creation you can do something like:
import random
max = 999
min = 0
original_values = range(min, max) # could be arbitrary list
n_edges = # some number..
my_edge_list = [(random.choice(original_values), random.choice(original_values))
for _ in range(n_edges)]
To assert you have all values in there you can do the following
vals = set([v for tup in my_edge_list for v in tup])
assert all([v in vals for v in original_values])
The assert will make sure you have the proper representation in your edges. As far as doing your best to make sure you don't hit that assert you can do a couple of things.
Sample without replacement from your list of integers until they are all gone to create a "base network" and then randomly add on more to your hearts desire
Make n_edges sufficiently high that it's very likely your condition will be met. If it's not try again...
Really depends on what you're going to use the network for and what kind of structure you want it to have
EDIT: I have updated my response to be more robust to an arbitrary list of values rather than requiring a sequential list

random.seed(datetime.datetime.now())
from random import randint
# ot generate 100 tuples with randints in range 0-99
li = [(randint(0,99),randint(0,99)) for i in range(100)]
print(li)
[(80, 55), (3, 10), (66, 65), (26, 23), (8, 72), (83, 25), (24, 99), (72, 9), (52, 76), (72, 68), (67, 25), (72, 18), (94, 62), (7, 62), (49, 94), (29, 89), (11, 38), (52, 51), (19, 32), (20, 85), (56, 61), (4, 40), (97, 58), (82, 2), (50, 82), (77, 5), (2, 9), (2, 46), (39, 4), (74, 40), (69, 15), (1, 77), (45, 58), (80, 59), (85, 80), (27, 80), (81, 4), (22, 33), (77, 60), (75, 87), (43, 36), (60, 34), (90, 54), (75, 3), (89, 84), (51, 93), (62, 64), (81, 50), (15, 60), (33, 97), (42, 62), (83, 26), (13, 33), (41, 87), (29, 63), (4, 32), (6, 14), (79, 73), (95, 4), (41, 16), (96, 64), (15, 28), (35, 13), (35, 82), (77, 16), (63, 27), (75, 37), (11, 52), (21, 35), (37, 96), (9, 86), (83, 11), (5, 42), (34, 32), (17, 8), (65, 55), (58, 19), (90, 40), (18, 75), (29, 14), (0, 11), (25, 68), (34, 52), (22, 8), (12, 53), (16, 49), (73, 54), (78, 80), (74, 60), (40, 68), (69, 20), (37, 38), (74, 60), (53, 90), (25, 48), (44, 52), (49, 27), (28, 35), (29, 94), (35, 60)]

Here is a solution that first generates a random population of nodes (pop1), then shuffles it (pop2) and combines those into a list of pairs.
Note: this method only yields vertices where each node is exactly once start and exactly once end, so maybe not what you're after. See below for another method
import random, copy
random.seed() # defaults to time.time() ...
# extract a number of samples - the number of nodes you want
pop1 = random.sample(xrange(1000), 10)
pop2 = copy.deepcopy( pop1 )
random.shuffle( pop2 )
# generate pairs from the same population - this guarantees your constraint
pairs = zip( pop1, pop2 )
print pairs
Output:
[(17, 347), (812, 688), (347, 266), (731, 342), (342, 49), (904, 17), (49, 731), (50, 904), (688, 50), (266, 812)]
Here is another method
This allows for duplicate occurrences of the nodes.
The idea is to draw start and end nodes from the same population:
import random
random.seed()
population = range(10) # any population would do
# choose randomly from the population for both ends
# so you can have duplicates
pairs = [(random.choice(population), random.choice(population) for _ in xrange(100)]
print pairs[:10]
Output:
[(1, 9), (7, 1), (8, 6), (4, 7), (6, 2), (7, 3), (0, 2), (1, 0), (8, 3), (8, 3)]

Python 3: Converting A Tuple To A String

I have the following code:
var_one = var_two[var_three-1]
var_one = "string_one" + var_1
And I need to do the following to it:
var_four = 'string_two', var_one
However, this returns the following error:
TypeError: Can't convert 'tuple' object to str implicity
I have tried things such as str(var_one) and using strip but these did not work.
What can I do to achieve the result I require?
EDIT - Here are what the variables contain:
var_one: new variable
var_two: tuple
var_three: integer
var_four: new
EDIT2:
The line in the program that makes the error is: os.system(var_four)

one_big_string = ''.join(tuple)
print one_big_string

What you've written is fine:
>>> x = 1
>>> y = 1, x
>>>
The problem is that somewhere else in your code, you're using var_four as a string where it should be a tuple.
BTW, I think it's neater to put parentheses around tuples like this; otherwise I tend to think they're being used in tuple unpacking.
EDIT: There are all sorts of ways to join and format strings -- Python is good at that. In somewhat-decreasing order of generality:
"{first_thing} {second_thing}".format(first_thing=var_one, second_thing=var_two)
"{0} {1}".format(var_one, var_two)
var_one + var_two

Your code looks fine as is.
Try running import pdb; pdb.set_trace() in your program to see if you can find the line triggering the error.
EDIT: You'll want to use ''.join(var_four) to convert var_four into a string before adding it to whatever it is you want to use it. Please note that this will actually create a new string and not overwrite var_four. See Python 3 string.join() equivalent?
Also, you should be using the subprocess module instead of os.system. See the Python 3.x documentation.

os.system expects a string which will will execute in the shell, but you're giving it a tuple instead.
Imagine we want to run the command rm -rf /home/mike. You might be doing something like
binary_and_option = 'rm -rf'
directory = '/home/mike'
command = binary_and_option, directory # This is the tuple
# ('rm -rf', '/home/mike')
# it is NOT the string
# 'rm -rf /home/mike'
os.system(command) # this clearly won't work, since it's just
# os.system(('rm -rf', '/home/mike'))
what you want to do instead is
command = "%d %d" % (binary_and_option, directory)
to assemble the string. You are probably thinking comma assembles str-ed objects together with spaces in between, but that's only for print; it's not how strings work in general.
But wait, there's more! You never want to use os.system, especially when you're going to build commands. It invokes the shell (which introduces unncessary security risks and other penalties) and has an inflexible API. Instead, use the subprocess module.
import subprocess
binary_and_option = ['rm', '-rf']
directory = '/home/mike'
command = binary_and_option + [directory]
subprocess.call(command)

str(my_tuple)
This seems too easy, but this works in Python 3.6
>>> x = list(range(100))
>>> y = list(range(500, 600))
>>> zip_obj = zip(x, y)
>>> my_tuple = tuple(zip_obj)
>>> type(my_tuple)
>>> <class 'tuple'>
>>> tuple_str = str(my_tuple)
>>> tuple_str
'((0, 500), (1, 501), (2, 502), (3, 503), (4, 504), (5, 505), (6, 506), (7, 507), (8, 508), (9, 509), (10, 510), (11, 511), (12, 512), (13, 513), (14, 514), (15, 515), (16, 516), (17, 517), (18, 518), (19, 519), (20, 520), (21, 521), (22, 522), (23, 523), (24, 524), (25, 525), (26, 526), (27, 527), (28, 528), (29, 529), (30, 530), (31, 531), (32, 532), (33, 533), (34, 534), (35, 535), (36, 536), (37, 537), (38, 538), (39, 539), (40, 540), (41, 541), (42, 542), (43, 543), (44, 544), (45, 545), (46, 546), (47, 547), (48, 548), (49, 549), (50, 550), (51, 551), (52, 552), (53, 553), (54, 554), (55, 555), (56, 556), (57, 557), (58, 558), (59, 559), (60, 560), (61, 561), (62, 562), (63, 563), (64, 564), (65, 565), (66, 566), (67, 567), (68, 568), (69, 569), (70, 570), (71, 571), (72, 572), (73, 573), (74, 574), (75, 575), (76, 576), (77, 577), (78, 578), (79, 579), (80, 580), (81, 581), (82, 582), (83, 583), (84, 584), (85, 585), (86, 586), (87, 587), (88, 588), (89, 589), (90, 590), (91, 591), (92, 592), (93, 593), (94, 594), (95, 595), (96, 596), (97, 597), (98, 598), (99, 599))'
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

k-closest point to a specific target - python

Take a look at the Neighbors module from sklearn: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.neighbors This will provide you optimized methods to do this.

Related

Using Python Pandas, why is my vectorized gradient descent so much slower than my gradient descent using loops?

Remove overlapping spans from list of tuple values based on the length of spans in python

Backtracking. Begin iteration again from zero in python

How to generate a list with couples of random integers?

Python 3: Converting A Tuple To A String

Categories

Resources