Appending line points with same slope to Python dict - python

I am trying to write a function where I can spit out all the points in the same line. I am calculating that by the fact, that the slope between two pairs of points must be same for that.
I have iterated through input file to get a list of points and calculated slope. My next step would be to put them on a HashMap (or Dict in Python), with the key being the slope and update it with points and slope. If slope for those two numbers is already present, add the points to same entry and remove any duplicates.
I was able to extract input, calculate slope and put them on in a hashmap. However, putting them on hashmap is a bit challenging for me as I am trying to use Java-like syntax which I am familiar with.
Can someone help me with updating the hashmap ensuring no dups are inserted?
here is what I have done so far:
slopeMap = {}
for x in range (0, len(arr)):
for y in range (x+1, len(arr)):
slopeForPoints = (slope(arr[x][0], arr[y][0], arr[x][1], arr[y][1]))
if slopeMap.has_key(slopeForPoints) == False:
slopeMap[slopeForPoints].append()
"slopeForPoints" in slopeMap
slopeMap["slopeForPoints"] =
a.setdefault("somekey",[]).append("bob")
print slopeForPoints
I just need help with the above function. Slope and iterate function I was able to get working.
Sample slope values (Key- HashMap)
0.0
1.0
0.0
0.9
Sample point values (Value - HashMap)
0.0,0.0
1.1,1.1
3.5,4.5
2.2,2.2

As mentioned by Mad Physicist, you need to calculate the more than just the slope to identify unique lines, as parallel lines would have the same slope but not necessarily be the same line.
There are a few options for this. One such option is to make the keys of your dictionary tuples, such as (slope, intercept). Then to make sure the points are unique, you could make the values for your dictionary sets of tuples.
The idea would look something like this:
slope, intercept = slope_intercept(point1, point2) #Each point is (point_x, point_y)
#Need to write the slope_intercept function
if (slope, intercept) not in slopeMap:
slopeMap[(slope,intercept)] = set() #Could be done with a defaultDict instead
slopeMap[(slope,intercept)].add(point1))
slopeMap[(slope,intercept)].add(point2))
Note, it's more Pythonic to say
if slopeForPoints not in slopeMap:

Related

Is there a more efficient an robust way to create a minimum proximity algorithm for a distance matrix?

I am trying to make an algorithm that propagates from point to point in a distance matrix using the smallest distance in the proximity. The code has two conditions: the minimum distance must be no less than 0 and each point must be visited once and return to the starting position.
This is my code in its entirety:
def totalDistance(aList):
path = []
for j in range(0,len(aList)):
k=j
order = []
for l in range(0,len(aList)):
order.append(k)
initval= min(x for x in aList[k] if x > 0 )
k = aList[k].index(initval)
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
path.append(order)
return path
The code is meant to return the indexes of the points in within the closes proximity of the evaluated point.
aList = [[0,3,4,6],[3,0,7,3],[4,7,0,9],[6,3,9,0]] and represents the distance matrix.
When running the code, I get the following error:
initval= min(x for x in aList[k] if x > 0 )
ValueError: min() arg is an empty sequence
I presume that when I make the columns in my distance matrix zero with the following function:
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
the min() function is unable to find a value with the given conditions. Is there a better way to format my code such that this does not occur or a better approach to this problem all together?
One technique and a pointer on the rest that you say is working...
For preventing re-visiting / backtracking. One of the common design patterns for this is to keep a separate data structure to "mark" the places you've been. Because your points are numerically indexed, you could use a list of booleans, but I think it is much easier to just keep a set of the places you've been. Something like this...
visited = set() # places already seen
# If I decide to visit point/index "3"...
visited.add(3)
Not really a great practice to modify your input data as you are doing, and especially so if you are looping over it, which you are...leads to headaches.
So then... Your current error is occurring because when you screen the rows for x>0 you eventually get an empty list because you are changing values and then min() chokes. So part of above can fix that, and you don't need to zero-ize, just mark them.
Then, the obvious question...how to use the marks? You can just use it as a part of your search. And it can work well with the enumerate command which can return index values and the value by enumeration.
Try something like this, which will make a list of "eligible" tuples with the distance and index location.
pts_to_consider = [(dist, idx) for idx, dist in enumerate(aList[k])
if dist > 0
and idx not in visited]
There are other ways to do this with numpy and other things, but this is a reasonable approach and close to what you have in code now. Comment back if stuck. I don't want to give away the whole farm because this is probably H/W. Perhaps you can use some of the hints here.

Change next list element during iteration?

Imagine you have a list of points in the 2D-space. I am trying to find symmetric points.
For doing that I iterate over my list of points and apply symmetry operations. So suppose I apply one of these operations to the first point and after this operation it is equal to other point in the list. These 2 points are symmetric.
So what I want is to erase this other point from the list that I am iterating so in this way my iterating variable say "i" won't take this value. Because I already know that it is symmetric with the first point.
I have seen similar Posts but they remove a value in the list that they have already taken. What I want is to remove subsequent values.
Whatever symmetric points turn out to be True add them to a set, since set maintains unique elements and look up is O(1) you can use if point not in set condition.
if point not in s:
#test for symmetry
if symmetric:
s.add(point)
In general it is a bad idea to remove values from a list you are iterating over. There are, however, another ways to skip the symmetric points. For example, you can check for each point if you have seen a symmetric one before:
for i, point in enumerate(points):
if symmetric(point) not in points[:i]:
# Do whatever you want to do
Here symmetric produces a point according to your symmetry operation. If your symmetry operation connects more that two points you can do
for i, point in enumerate(points):
for sympoint in symmetric(point):
if sympoint in points[:i]:
break
else:
# Do whatever you want to do

Two column data with multiple minimums

I am trying to read two column data and find all the minimums it has. Graph looks like this
x axis is time and y axis is flux. Data can be seen with this link
https://onedrive.live.com/redir?resid=1E870F010DBA8407!298&authkey=!ABdG6FJ_i3d9oWI&ithint=file%2ctxt
I couldn't find a cool algorithm. I also tried to fit a curve to define minimums easily but the result weren't correct. Which statical method is suitable for this job. I used python and C.
I ll be happy if you share your ideas.
The first thing to do is to sort the list of points along the x axis, otherwise it is going to be an absolute pain. Then you can use:
minima_indices=[i+1 for i,y in enumerate(y_list[1:-1]) if y_list[i-1]>=y<=y_list[i+1]]
This should give you the indices of the minima in the sorted list. Note that it omits the first and last point, if you want them to be included this can be done easily.
If you only want the deep minima in your graph then you can filter out all the small minima at the end (or during the original list comprehension by adding a condition):
def approx(a,b,tol):
if abs(a-b)<tol: return True
return False
minima_indices_filtered=[i for i in minima_indices if not approx(y_list[i],y_0,tol)]
y_0 is the value of the flat line at the top of your picture, and tol is how deep a minima has to be before it registers as being a minima.

Sorting points on multiple lines

Given that we have two lines on a graph (I just noticed that I inverted the numbers on the Y axis, this was a mistake, it should go from 11-1)
And we only care about whole number X axis intersections
We need to order these points from highest Y value to lowest Y value regardless of their position on the X axis (Note I did these pictures by hand so they may not line up perfectly).
I have a couple of questions:
1) I have to assume this is a known problem, but does it have a particular name?
2) Is there a known optimal solution when dealing with tens of billions (or hundreds of millions) of lines? Our current process of manually calculating each point and then comparing it to a giant list requires hours of processing. Even though we may have a hundred million lines we typically only want the top 100 or 50,000 results some of them are so far "below" other lines that calculating their points is unnecessary.
Your data structure is a set of tuples
lines = {(y0, Δy0), (y1, Δy1), ...}
You need only the ntop points, hence build a set containing only
the top ntop yi values, with a single pass over the data
top_points = choose(lines, ntop)
EDIT --- to choose the ntop we had to keep track of the smallest
one, and this is interesting info, so let's return also this value
from choose, also we need to initialize decremented
top_points, smallest = choose(lines, ntop)
decremented = top_points
and start a loop...
while True:
Generate a set of decremented values
decremented = {(y-Δy, Δy) for y, Δy in top_points}
decremented = {(y-Δy, Δy) for y, Δy in decremented if y>smallest}
if decremented == {}: break
Generate a set of candidates
candidates = top_lines.union(decremented)
generate a new set of top points
new_top_points, smallest = choose(candidates, ntop)
The following is no more necessary
check if new_top_points == top_points
if new_top_points == top_points: break
top_points = new_top_points</strike>
of course we are in a loop...
The difficult part is the choose function, but I think that this
answer to the question
How can I sort 1 million numbers, and only print the top 10 in Python?
could help you.
It's not a really complicated thing, just a "normal" sorting problem.
Usually sorting requires a large amount of computing time. But your case is one where you don't need to use complex sorting techniques.
You on both graphs are growing or falling constantly, there are no "jumps". You can use this to your advantage. The basic algorithm:
identify if a graph is growing or falling.
write a generator, that generates the values; from left to right if raising, form right to left if falling.
get the first value from both graphs
insert the lower on into the result list
get a new value from the graph that had the lower value
repeat the last two steps until one generator is "empty"
append the leftover items from the other generator.

Getting Keys Within Range/Finding Nearest Neighbor From Dictionary Keys Stored As Tuples

I have a dictionary which has coordinates as keys. They are by default in 3 dimensions, like dictionary[(x,y,z)]=values, but may be in any dimension, so the code can't be hard coded for 3.
I need to find if there are other values within a certain radius of a new coordinate, and I ideally need to do it without having to import any plugins such as numpy.
My initial thought was to split the input into a cube and check no points match, but obviously that is limited to integer coordinates, and would grow exponentially slower (radius of 5 would require 729x the processing), and with my initial code taking at least a minute for relatively small values, I can't really afford this.
I heard finding the nearest neighbor may be the best way, and ideally, cutting down the keys used to a range of +- a certain amount would be good, but I don't know how you'd do that when there's more the one point being used.Here's how I'd do it with my current knowledge:
dimensions = 3
minimumDistance = 0.9
#example dictionary + input
dictionary[(0,0,0)]=[]
dictionary[(0,0,1)]=[]
keyToAdd = [0,1,1]
closestMatch = 2**1000
tooClose = False
for keys in dictionary:
#calculate distance to new point
originalCoordinates = str(split( dictionary[keys], "," ) ).replace("(","").replace(")","")
for i in range(dimensions):
distanceToPoint = #do pythagors with originalCoordinates and keyToAdd
#if you want the overall closest match
if distanceToPoint < closestMatch:
closestMatch = distanceToPoint
#if you want to just check it's not within that radius
if distanceToPoint < minimumDistance:
tooClose = True
break
However, performing calculations this way may still run very slow (it must do this to millions of values). I've searched the problem, but most people seem to have simpler sets of data to do this to. If anyone can offer any tips I'd be grateful.
You say you need to determine IF there are any keys within a given radius of a particular point. Thus, you only need to scan the keys, computing the distance of each to the point until you find one within the specified radius. (And if you do comparisons to the square of the radius, you can avoid the square roots needed for the actual distance.)
One optimization would be to sort the keys based on their "Manhattan distance" from the point (that is, add the component offsets), since the Euclidean distance will never be less than this. This would avoid some of the more expensive calculations (though I don't think you need and trigonometry).
If, as you suggest later in the question, you need to handle multiple points, you can obviously process each individually, or you could find the center of those points and sort based on that.

Categories

Resources