Tips on improving this function?

Tips on improving this function? - python

This may be quite a green question, but I hope you understand – just started on python and trying to improve. Anyways, wrote a little function to do the "Shoelace Method" of finding the area of a polygon in a Cartesian plane (see this for a refresher).
I want to know how can I improve my method, so I can try out fancy new ways of doing the same old things.
def shoelace(list):
r_p = 0 # Positive Values
r_n = 0 # Negative Values
x, y = [i[0] for i in list], [i[1] for i in list]
x.append(x[0]), y.append(y[0])
print(x, y)
for i in range(len(x)):
if (i+1) < len(x):
r_p += (x[i] * y[i+1])
r_n += (x[i+1] * y[i])
else:
break
return ((abs(r_p - r_n))/2)

Don't use short variable names that need to be commented; use names that indicate the function.
list is the name of the built-in list type, so while Python will let you replace that name, it's a bad idea stylistically.
, should not be used to separate what are supposed to be statements. You can use ;, but it's generally better to just put things on separate lines. In your case, it happens to work because you are using .append for the side effect, but basically what you are doing is constructing the 2-tuple (None, None) (the return values from .append) and throwing it away.
Use built-in functions where possible for standard list transformations. See the documentation for zip, for example. Except you don't really need to perform this transformation; you want to consider pairs of adjacent points, so do that - and take apart their coordinates inside the loop.
However, you can use zip to transform the list of points into a list of pairs-of-adjacent-points :) which lets you write a much cleaner loop. The idea is simple: first, we make a list of all the "next" points relative to the originals, and then we zip the two point-lists together.
return is not a function, so the thing you're returning does not need surrounding parentheses.
Instead of tallying up separate positive and negative values, perform signed arithmetic on a single value.
def shoelace(points):
signed_double_area = 0
next_points = points[1:] + points[:1]
for begin, end in zip(points, next_points):
begin_x, begin_y = begin
end_x, end_y = end
signed_double_area += begin_x * end_y
signed_double_area -= end_x * begin_y
return abs(signed_double_area) / 2

Functionally, your program is quite good. One minor remark is to replace range(len(x)) with xrange(len(x)). It makes the program slightly more efficient. Generally, you should use range only in cases where you actually need the full list of values it creates. If all you need is to loop over those values, use xrange.
Also, you don't need the parenthesis in the return statement, nor in the r_p += and r_n += statements.
Regarding style, in Python variable assignments shouldn't be done like you did, but rather with a single space on each side of the = symbol:
r_p = 0
r_n = 0

Related

Is there a more efficient an robust way to create a minimum proximity algorithm for a distance matrix?

I am trying to make an algorithm that propagates from point to point in a distance matrix using the smallest distance in the proximity. The code has two conditions: the minimum distance must be no less than 0 and each point must be visited once and return to the starting position.
This is my code in its entirety:
def totalDistance(aList):
path = []
for j in range(0,len(aList)):
k=j
order = []
for l in range(0,len(aList)):
order.append(k)
initval= min(x for x in aList[k] if x > 0 )
k = aList[k].index(initval)
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
path.append(order)
return path
The code is meant to return the indexes of the points in within the closes proximity of the evaluated point.
aList = [[0,3,4,6],[3,0,7,3],[4,7,0,9],[6,3,9,0]] and represents the distance matrix.
When running the code, I get the following error:
initval= min(x for x in aList[k] if x > 0 )
ValueError: min() arg is an empty sequence
I presume that when I make the columns in my distance matrix zero with the following function:
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
the min() function is unable to find a value with the given conditions. Is there a better way to format my code such that this does not occur or a better approach to this problem all together?

One technique and a pointer on the rest that you say is working...
For preventing re-visiting / backtracking. One of the common design patterns for this is to keep a separate data structure to "mark" the places you've been. Because your points are numerically indexed, you could use a list of booleans, but I think it is much easier to just keep a set of the places you've been. Something like this...
visited = set() # places already seen
# If I decide to visit point/index "3"...
visited.add(3)
Not really a great practice to modify your input data as you are doing, and especially so if you are looping over it, which you are...leads to headaches.
So then... Your current error is occurring because when you screen the rows for x>0 you eventually get an empty list because you are changing values and then min() chokes. So part of above can fix that, and you don't need to zero-ize, just mark them.
Then, the obvious question...how to use the marks? You can just use it as a part of your search. And it can work well with the enumerate command which can return index values and the value by enumeration.
Try something like this, which will make a list of "eligible" tuples with the distance and index location.
pts_to_consider = [(dist, idx) for idx, dist in enumerate(aList[k])
if dist > 0
and idx not in visited]
There are other ways to do this with numpy and other things, but this is a reasonable approach and close to what you have in code now. Comment back if stuck. I don't want to give away the whole farm because this is probably H/W. Perhaps you can use some of the hints here.

How to apply recursion function in land subdivision?

I've made a subdivision code that allows division of a polygon by bounding box method. subdivision(coordinates) results in subblockL and subblockR (left and right). If I want to repeat this subdivision code until it reaches the area less than 200, I would need to use recursion method.
ex:
B = subdivision(A)[0], C = subdivision(B)[0], D = subdivision(C)[0]... until it reaches the area close to 200. (in other words,
subdivision(subdivision(subdivision(A)[0])[0])[0]...)
How can I simplify repetition of subdivision? and How can I apply subdivision to every block instead of single block?
while area(subdivision(A)[0]) < 200:
for i in range(A):
subdivision(i)[0]
def sd_recursion(x):
if x == subdivision(A):
return subdivision(A)
else:
return
I'm not sure what function to put in

"What function to put in" is the function itself; that's the definition of recursion.
def sd_recursive(coordinates):
if area(coordinates) < 200:
return [coordinates]
else:
a, b = subdivision(coordinates)
return sd_recursive(a) + sd_recursive(b) # list combination, not arithmetic addition
To paraphrase, if the area is less than 200, simply return the polygon itself. Otherwise, divide the polygon into two parts, and return ... the result of applying the same logic to each part in turn.
Recursive functions are challenging because recursive functions are challenging. Until you have wrapped your head around this apparently circular argument, things will be hard to understand. The crucial design point is to have a "base case" which does not recurse, which in other words escapes the otherwise infinite loop of the function calling itself under some well-defined condition. (There's also indirect recursion, where X calls Y which calls X which calls Y ...)
If you are still having trouble, look at one of the many questions about debugging recursive functions. For example, Understanding recursion in Python
I assumed the function should return a list in every case, but there are multiple ways to arrange this, just so long as all parts of the code obey the same convention. Which way to prefer also depends on how the coordinates are represented and what's convenient for your intended caller.
(In Python, ['a'] + ['b'] returns ['a', 'b'] so this is not arithmetic addition of two lists, it's just a convenient way to return a single list from combining two other lists one after the other.)
Recursion can always be unrolled; the above can be refactored to
def sd_unrolled(coordinates):
result = []
while coordinates:
if area(coordinates[0]) < 200:
result.extend(coordinates[0])
coordinates = coordinates[1:]
a, b = subdivision(coordinates[0])
coordinates = [a, b] + coordinates[1:]
return result
This is tricky in its own right (but could perhaps be simplified by introducing a few temporary variables) and pretty inefficient or at least inelegant as we keep on copying slices of the coordinates list to maintain the tail while we keep manipulating the head (the first element of the list) by splitting it until each piece is small enough.

Python Trig Functions Return Complex Numbers?

I am writing code that accepts the degree by which a motor turns and uses that data to calculate the distance covered by the wheels (using distance = no. of rotations * distance covered per rotation).
It then makes an error adjustment (taking into consideration environmental factors such as friction).
Finally, using trigonometry, it calculates the distance moved along the x-axis and y-axis.
All the above is done by the function straight contained within the class CoordinateManager. This function is called by an instance of another class.
class CoordinateManager:
goalcord = [20, 0]
def __init__(self):
self.curcord = [0, 0]
self.theta = 0
def get_compass_angle(self):
compass = Sensor(address='in2')
return compass.value(0)
def turn(self, iangle, fangle):
self.theta = self.theta + (fangle-iangle)
def straight(self, turnangle):
d = turnangle*2*3.14*2/360
d = 1.8120132*(d**0.8938054)
thetarad = radians(self.theta)
dx = d*sin(thetarad)
dy = d*cos(thetarad)
self.curcord[0] += dx
self.curcord[1] += dy
Printing both d and self.theta shows that they contain correct values.
This must mean that the array self.curcord has valid values too. However, this has not been the case. Printing the two elements of self.curcord outputs complex numbers (some big float + another big floatj).
I can think of no logical explanation for this other than that the trigonometric functions must be returning complex numbers. However, I think the chances that a python built-in lib function returns wrong values are extraordinarily slim.
Is there any logical error that I may be overlooking?
Edit: I just tried changing the last two lines to:
self.curcord[0] += dx
self.curcord[1] += dy
I just tried using .real when displaying the values. Even though the values are real now, they are still wrong. I will look further into whether this is caused by some calculation error.

Since you said in the comments above that turnangle can be any integer, the problem can be directly traced to this line:
d = 1.8120132*(d**0.8938054)
Since turnangle can be negative, the value of d before this line is executed can also be negative; a negative value raised to an arbitrary decimal power is in general complex.
Therefore the problem does not lie with the trig functions at all. The above also leads me to believe that when you said
Printing both d and self.theta shows that they contain correct values
... you only did so after this line:
d = turnangle*2*3.14*2/360
This would explain why you wrongly thought the problem must lie elsewhere.
UPDATE:
It is a very bad habit to set a variable to some function of itself like you did. Try to use a different variable name to avoid confusion - as you saw above I had to refer to "this line" rather than by their variable names.
Perhaps something like this would work, assuming that the behaviour of the motor is the same regardless of the sign of turnangle?
d = sign(d) * 1.8120132 * (abs(d) ** 0.8938054)

Arrays returning as empty when using an 'if statement' after glob.glob on FITS Files

I am using glob.glob make my script only read the data from certain FITS files (astropy.io.fits is brought in as pf and numpy as np). For this x is the value that I change to get these certain files (For reference the 'x = np.arrange(0) and y1 = np.arange(0) simply creates an empty array that I then fill with data later.
def Graph(Pass):
x = np.arange(0)
y1 = np.arange(0)
pathfile = '*_v0' + str(Pass) + '_stis_f25srf2_proj.fits'
for name in glob.glob(pathfile):
imn = 'FilePath' + str(name)
However, I wanted to add another filter to the files that I use. In each FITS file's header there is a quality I will call a. a is a non-integer numerical value that each file has. I only want to read files that have a within a specific range. I then take the data I need from the FITS file and add it to an array (for this is is 'power' p1 being added to y1 and 'time' t being added to x).
imh = pf.getheader(imn)
a = imh['a']
if (192 <= a <= 206) is False:
pass
if (192 <= a <= 206) is True:
im = pf.getdata(imn, origin='lower')
subim1 = im[340:390, 75:120]
p1 = np.mean(subim1)
t = SubfucntionToGetTime
y1 = np.append(y1, p1)
x = np.append(x, t)
However when I run this function it returns with arrays with no values. I believe it is something to do with my code not working properly when it encounters a file without the appropriate a value, but I can't know how to fix this.
For additional reference I have tested this for a smaller subgroup of FITS files that I know have the correct a values and it works fine, that is why I suspect it is experiencing a values that messes-up the code as the first few files don't have the correct a values.

There's a lot going on here, and the code you posted isn't even valid (has indentation errors). I don't think there's a useful question here for Stack Overflow because you're misusing a number of things without realizing it. That said, I want to be helpful so I'm posting an answer instead of just a comment because I format code better in an answer.
First of all, I don't know what you want here:
pathfile = '*_v0' + str(x) + '.fits'
Because before this you have
x = np.arange(0)
So as you can check, str(x) is just a constant--the string '[]'. So you're saying you want a wildcard pattern that looks like '*_v0[].fits' which I doubt is what you want, but even if it is you should just write that explicitly without the str(x) indirection.
Then in your loop over the glob.glob results you do:
imn = 'FilePath' + str(name)
name should already be a string so no need to str(name). I don't know why you're prepending 'FilePath' because glob.glob returns filenames that match your wildcard pattern. Why would you prepend something to the filename, then?
Next you test (192 <= a <= 206) twice. You only need to check this once, and don't use is True and is False. The result of a comparison is already a boolean so you don't need to make this extra comparison.
Finally, there's not much advantage to using Numpy arrays here unless you're looping over thousands of FITS files. But using np.append to grow arrays is very slow since in each loop you make a new copy of the array. For most cases you could use Python lists and then--if desired--convert the list to a Numpy array. If you had to use a Numpy array to start with, you would pre-allocate an empty array of some size using np.zeros(). You might guess a size to start it at and then grow it only if needed. Since you're looping over a list of files you could use the number of files you're looping over, for example.
Here's a rewrite of what I think you're trying to do in more idiomatic Python:
def graph(n_pass):
x = []
y1 = []
for filename in glob.glob('*_v0.fits'):
header = pf.getheader(filename)
a = header['a']
if not (192 <= a <= 206):
# We don't do any further processing for this file
# for 'a' outside this range
continue
im = pf.getdata(filename, origin='lower')
subim1 = im[340:390, 75:120]
p1 = np.mean(subim1)
t = get_time(...)
y1.append(p1)
x.append(t)
You might also consider clearer variable names, etc. I'm sure this isn't exactly what you want to do but maybe this will help give you a little better structure to play with.

A guess, but is a a string not an integer?
>>> 192 <= "200" <= 206
False
>>> 192 <= int("200") <= 206
True

First, ditch the np.append. Use list append instead
x = []
y1 = []
....
y1.append(p1)
x.append(t)
np.arange(0) does create a 0 element array. But can't fill it. At best it serves to jumpstart the np.append step, which creates a new array with new values. arr=np.empty((n,), float) makes a n element array that can be filled with arr[i]=new_value statements.
This will be faster, and should give better information on what is being added. If the x and y1 remain [], then yes, your filtering is skipping this part of the code. I'd also throw in some print statements to be sure. For example replace the pass with a print so you actually see what cases are being rejected.
Without your pf file, or what ever it is, we can't reproduce your problem. We can only suggest ways to find out more about what is going on.

Recursion not breaking

I am trying to solve Euler problem 18 where I am required to find out the maximum total from top to bottom. I am trying to use recursion, but am stuck with this.
I guess I didn't state my problem earlier. What I am trying to achieve by recursion is to find the sum of the maximum number path. I start from the top of the triangle, and then check the condition is 7 + findsum() bigger or 4 + findsum() bigger. findsum() is supposed to find the sum of numbers beneath it. I am storing the sum in variable 'result'
The problem is I don't know the breaking case of this recursion function. I know it should break when it has reached the child elements, but I don't know how to write this logic in the program.
pyramid=[[0,0,0,3,0,0,0,],
[0,0,7,0,4,0,0],
[0,2,0,4,0,6,0],
[8,0,5,0,9,0,3]]
pos=[0,3]
def downleft(pyramid,pos):#returns down left child
try:
return(pyramid[pos[0]+1][pos[1]-1])
except:return(0)
def downright(pyramid,pos):#returns down right child
try:
return(pyramid[pos[0]+1][pos[1]+1])
except:
return(0)
result=0
def find_max(pyramid,pos):
global result
if downleft(pyramid,pos)+find_max(pyramid,[pos[0]+1,pos[1]-1]) > downright(pyramid,pos)+find_max(pyramid,[pos[0]+1,pos[1]+1]):
new_pos=[pos[0]+1,pos[1]-1]
result+=downleft(pyramid,pos)+find_max(pyramid,[pos[0]+1,pos[1]-1])
elif downright(pyramid,pos)+find_max(pyramid,[pos[0]+1,pos[1]+1]) > downleft(pyramid,pos)+find_max(pyramid,[pos[0]+1,pos[1]-1]):
new_pos=[pos[0]+1,pos[1]+1]
result+=downright(pyramid,pos)+find_max(pyramid,[pos[0]+1,pos[1]+1])
else :
return(result)
find_max(pyramid,pos)

A big part of your problem is that you're recursing a lot more than you need to. You should really only ever call find_max twice recursively, and you need some base-case logic to stop after the last row.
Try this code:
def find_max(pyramid, x, y):
if y >= len(pyramid): # base case, we're off the bottom of the pyramid
return 0 # so, return 0 immediately, without recursing
left_value = find_max(pyramid, x - 1, y + 1) # first recursive call
right_value = find_max(pyramid, x + 1, y + 1) # second recursive call
if left_value > right_value:
return left_value + pyramid[y][x]
else:
return right_value + pyramid[y][x]
I changed the call signature to have separate values for the coordinates rather than using a tuple, as this made the indexing much easier to write. Call it with find_max(pyramid, 3, 0), and get rid of the global pos list. I also got rid of the result global (the function returns the result).
This algorithm could benefit greatly from memoization, as on bigger pyramids you'll calculate the values of the lower-middle areas many times. Without memoization, the code may be impractically slow for large pyramid sizes.

Edit: I see that you are having trouble with the logic of the code. So let's have a look at that.
At each position in the tree you want to make a choice of selecting
the path from this point on that has the highest value. So what
you do is, you calculate the score of the left path and the score of
the right path. I see this is something you try in your current code,
only there are some inefficiencies. You calculate everything
twice (first in the if, then in the elif), which is very expensive. You should only calculate the values of the children once.
You ask for the stopping condition. Well, if you reach the bottom of the tree, what is the score of the path starting at this point? It's just the value in the tree. And that is what you should return at that point.
So the structure should look something like this:
function getScoreAt(x, y):
if at the end: return valueInTree(x, y)
valueLeft = getScoreAt(x - 1, y + 1)
valueRight = getScoreAt(x + 1, y + 1)
valueHere = min(valueLeft, valueRight) + valueInTree(x, y)
return valueHere
Extra hint:
Are you aware that in Python negative indices wrap around to the back of the array? So if you do pyramid[pos[0]+1][pos[1]-1] you may actually get to elements like pyramid[1][-1], which is at the other side of the row of the pyramid. What you probably expect is that this raises an error, but it does not.
To fix your problem, you should add explicit bound checks and not rely on try blocks (try blocks for this is also not a nice programming style).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.