Change next list element during iteration? - python

Imagine you have a list of points in the 2D-space. I am trying to find symmetric points.
For doing that I iterate over my list of points and apply symmetry operations. So suppose I apply one of these operations to the first point and after this operation it is equal to other point in the list. These 2 points are symmetric.
So what I want is to erase this other point from the list that I am iterating so in this way my iterating variable say "i" won't take this value. Because I already know that it is symmetric with the first point.
I have seen similar Posts but they remove a value in the list that they have already taken. What I want is to remove subsequent values.

Whatever symmetric points turn out to be True add them to a set, since set maintains unique elements and look up is O(1) you can use if point not in set condition.
if point not in s:
#test for symmetry
if symmetric:
s.add(point)

In general it is a bad idea to remove values from a list you are iterating over. There are, however, another ways to skip the symmetric points. For example, you can check for each point if you have seen a symmetric one before:
for i, point in enumerate(points):
if symmetric(point) not in points[:i]:
# Do whatever you want to do
Here symmetric produces a point according to your symmetry operation. If your symmetry operation connects more that two points you can do
for i, point in enumerate(points):
for sympoint in symmetric(point):
if sympoint in points[:i]:
break
else:
# Do whatever you want to do

Related

Selecting an item (from a set of items) based on distance and frequency of occurence

There exists a set of points (or items, it doesn't matter). Each point a is at a specific distance from other points in the set. The distance can be retrieved via the function retrieve_dist(a, b).
This question is about programming (in Python) an algorithm to pick a point, with replacement, from this set of points. The picked point:
i) has to be at the maximum possible distance from all already-selected points, while adhering to the requirement in (ii)
ii) the number of times an already-selected point occurs in the sample must carry weight in this calculation. I.e. more frequently-selected points should be weighed more heavily.
E.g. imagine a and b have already been selected (100 and 10 times respectively). Then when the next point is to be selected, it's distance from a matters more than its distance from b, in line with the frequency of occurrence of a in the already-selected sample.
What I can try:
This would have been easy to accomplish if weights/frequencies weren't in play. I could do:
distances = defaultdict(int)
for new_point in set_of_points:
for already_selected_point in selected_points:
distances[new_point] += retrieve_dist(new_point, already_selected_point)
Then I'd sort distances.items() by the second entry in each tuple, and would get the desired item to select.
However, when frequencies of already-selected points come into play, I just can't seem to wrap my head around this problem.
Can an expert help out? Thanks in advance.
A solution to your problem would be to make selected_points a list rather than a set. In this case, each new point is compared to a and b (and all other points) as many times as they have already been found.
If each point is typically found many times, it might be possible to improve perfomance using a dict instead, with the key being the points, and the value being the number of times each point is selected. In that case I think your algorithm would be
distances = defaultdict(int)
for new_point in set_of_points:
for already_selected_point, occurances in selected_points.items():
distances[new_point] += occurances * retrieve_dist(new_point, already_selected_point)

Appending line points with same slope to Python dict

I am trying to write a function where I can spit out all the points in the same line. I am calculating that by the fact, that the slope between two pairs of points must be same for that.
I have iterated through input file to get a list of points and calculated slope. My next step would be to put them on a HashMap (or Dict in Python), with the key being the slope and update it with points and slope. If slope for those two numbers is already present, add the points to same entry and remove any duplicates.
I was able to extract input, calculate slope and put them on in a hashmap. However, putting them on hashmap is a bit challenging for me as I am trying to use Java-like syntax which I am familiar with.
Can someone help me with updating the hashmap ensuring no dups are inserted?
here is what I have done so far:
slopeMap = {}
for x in range (0, len(arr)):
for y in range (x+1, len(arr)):
slopeForPoints = (slope(arr[x][0], arr[y][0], arr[x][1], arr[y][1]))
if slopeMap.has_key(slopeForPoints) == False:
slopeMap[slopeForPoints].append()
"slopeForPoints" in slopeMap
slopeMap["slopeForPoints"] =
a.setdefault("somekey",[]).append("bob")
print slopeForPoints
I just need help with the above function. Slope and iterate function I was able to get working.
Sample slope values (Key- HashMap)
0.0
1.0
0.0
0.9
Sample point values (Value - HashMap)
0.0,0.0
1.1,1.1
3.5,4.5
2.2,2.2
As mentioned by Mad Physicist, you need to calculate the more than just the slope to identify unique lines, as parallel lines would have the same slope but not necessarily be the same line.
There are a few options for this. One such option is to make the keys of your dictionary tuples, such as (slope, intercept). Then to make sure the points are unique, you could make the values for your dictionary sets of tuples.
The idea would look something like this:
slope, intercept = slope_intercept(point1, point2) #Each point is (point_x, point_y)
#Need to write the slope_intercept function
if (slope, intercept) not in slopeMap:
slopeMap[(slope,intercept)] = set() #Could be done with a defaultDict instead
slopeMap[(slope,intercept)].add(point1))
slopeMap[(slope,intercept)].add(point2))
Note, it's more Pythonic to say
if slopeForPoints not in slopeMap:

Sorting points on multiple lines

Given that we have two lines on a graph (I just noticed that I inverted the numbers on the Y axis, this was a mistake, it should go from 11-1)
And we only care about whole number X axis intersections
We need to order these points from highest Y value to lowest Y value regardless of their position on the X axis (Note I did these pictures by hand so they may not line up perfectly).
I have a couple of questions:
1) I have to assume this is a known problem, but does it have a particular name?
2) Is there a known optimal solution when dealing with tens of billions (or hundreds of millions) of lines? Our current process of manually calculating each point and then comparing it to a giant list requires hours of processing. Even though we may have a hundred million lines we typically only want the top 100 or 50,000 results some of them are so far "below" other lines that calculating their points is unnecessary.
Your data structure is a set of tuples
lines = {(y0, Δy0), (y1, Δy1), ...}
You need only the ntop points, hence build a set containing only
the top ntop yi values, with a single pass over the data
top_points = choose(lines, ntop)
EDIT --- to choose the ntop we had to keep track of the smallest
one, and this is interesting info, so let's return also this value
from choose, also we need to initialize decremented
top_points, smallest = choose(lines, ntop)
decremented = top_points
and start a loop...
while True:
Generate a set of decremented values
decremented = {(y-Δy, Δy) for y, Δy in top_points}
decremented = {(y-Δy, Δy) for y, Δy in decremented if y>smallest}
if decremented == {}: break
Generate a set of candidates
candidates = top_lines.union(decremented)
generate a new set of top points
new_top_points, smallest = choose(candidates, ntop)
The following is no more necessary
check if new_top_points == top_points
if new_top_points == top_points: break
top_points = new_top_points</strike>
of course we are in a loop...
The difficult part is the choose function, but I think that this
answer to the question
How can I sort 1 million numbers, and only print the top 10 in Python?
could help you.
It's not a really complicated thing, just a "normal" sorting problem.
Usually sorting requires a large amount of computing time. But your case is one where you don't need to use complex sorting techniques.
You on both graphs are growing or falling constantly, there are no "jumps". You can use this to your advantage. The basic algorithm:
identify if a graph is growing or falling.
write a generator, that generates the values; from left to right if raising, form right to left if falling.
get the first value from both graphs
insert the lower on into the result list
get a new value from the graph that had the lower value
repeat the last two steps until one generator is "empty"
append the leftover items from the other generator.

Getting Keys Within Range/Finding Nearest Neighbor From Dictionary Keys Stored As Tuples

I have a dictionary which has coordinates as keys. They are by default in 3 dimensions, like dictionary[(x,y,z)]=values, but may be in any dimension, so the code can't be hard coded for 3.
I need to find if there are other values within a certain radius of a new coordinate, and I ideally need to do it without having to import any plugins such as numpy.
My initial thought was to split the input into a cube and check no points match, but obviously that is limited to integer coordinates, and would grow exponentially slower (radius of 5 would require 729x the processing), and with my initial code taking at least a minute for relatively small values, I can't really afford this.
I heard finding the nearest neighbor may be the best way, and ideally, cutting down the keys used to a range of +- a certain amount would be good, but I don't know how you'd do that when there's more the one point being used.Here's how I'd do it with my current knowledge:
dimensions = 3
minimumDistance = 0.9
#example dictionary + input
dictionary[(0,0,0)]=[]
dictionary[(0,0,1)]=[]
keyToAdd = [0,1,1]
closestMatch = 2**1000
tooClose = False
for keys in dictionary:
#calculate distance to new point
originalCoordinates = str(split( dictionary[keys], "," ) ).replace("(","").replace(")","")
for i in range(dimensions):
distanceToPoint = #do pythagors with originalCoordinates and keyToAdd
#if you want the overall closest match
if distanceToPoint < closestMatch:
closestMatch = distanceToPoint
#if you want to just check it's not within that radius
if distanceToPoint < minimumDistance:
tooClose = True
break
However, performing calculations this way may still run very slow (it must do this to millions of values). I've searched the problem, but most people seem to have simpler sets of data to do this to. If anyone can offer any tips I'd be grateful.
You say you need to determine IF there are any keys within a given radius of a particular point. Thus, you only need to scan the keys, computing the distance of each to the point until you find one within the specified radius. (And if you do comparisons to the square of the radius, you can avoid the square roots needed for the actual distance.)
One optimization would be to sort the keys based on their "Manhattan distance" from the point (that is, add the component offsets), since the Euclidean distance will never be less than this. This would avoid some of the more expensive calculations (though I don't think you need and trigonometry).
If, as you suggest later in the question, you need to handle multiple points, you can obviously process each individually, or you could find the center of those points and sort based on that.

Mapping a range of integers to a single integer

I have a function which receives an integer as an input and depending on what range this input lies in, assigns to it a difficulty value. I know that this can be done using if else loops. I was wondering whether there is a more efficient/cleaner way to do it.
I tried to do something like this
TIME_RATING_KEY ={
range(0,46):1,
range(46,91):2,
range(91,136):3,
range(136,201):4,
range(201,10800):5,
}
But found out that we can use range as a key in dict(right?). So is there a better way to do this?
You can implement an interval tree. This kind of data structures are able to return all the intervals that intersect a given input point.
In your case intervals don't overlap, so they would always return 1 interval.
Centered interval trees run in O(log n + m) time, where m is the number of intervals returned (1 in your case). So this would reduce the complexity from O(n) to O(log n).
The idea of these interval trees is the following:
You consider the interval that encloses all the intervals you have
Take the center of that interval and partition the given intervals into those that end before that point, those that contain that point and those that start after it.
Recursively construct the same kind of tree for the intervals ending before the center and those starting after it
Keep the intervals that contain the center point in two sorted sequences. One sorted by starting point, and the other sorted by ending point
When searching go left or right depending on the center point. When you find an overlap you use binary search on the sorted sequence you want to check (this allows for looking up not only intervals that contain a given point but intervals that intersect or contain a given interval).
It's trivial to modify the data structure to return a specific value instead of the found interval.
This said, from the context I don't think you actually need to reduce the efficiency of this lookup and you should probably use the simpler and more readable solution since it would be more maintainable and there are less chances to make mistakes.
However reading about the mroe efficient data structure can turn out useful in the future.
The simplest way is probably just to write a short function:
def convert(n, difficulties=[0, 46, 91, 136, 201]):
if n < difficulties[0]:
raise ValueError
for difficulty, end in enumerate(difficulties):
if n < end:
return difficulty
else:
return len(difficulties)
Examples:
>>> convert(32)
1
>>> convert(68)
2
>>> convert(150)
4
>>> convert(250)
5
As a side note: You can use a range as a dictionary key in Python 3.x, but not directly in 2.x (because range returns a list). You could do:
TIME_RATING_KEY = {tuple(range(0, 46)): 1, ...}
However that won't be much help!

Categories

Resources