Two column data with multiple minimums

Two column data with multiple minimums - python

I am trying to read two column data and find all the minimums it has. Graph looks like this
x axis is time and y axis is flux. Data can be seen with this link
https://onedrive.live.com/redir?resid=1E870F010DBA8407!298&authkey=!ABdG6FJ_i3d9oWI&ithint=file%2ctxt
I couldn't find a cool algorithm. I also tried to fit a curve to define minimums easily but the result weren't correct. Which statical method is suitable for this job. I used python and C.
I ll be happy if you share your ideas.

The first thing to do is to sort the list of points along the x axis, otherwise it is going to be an absolute pain. Then you can use:
minima_indices=[i+1 for i,y in enumerate(y_list[1:-1]) if y_list[i-1]>=y<=y_list[i+1]]
This should give you the indices of the minima in the sorted list. Note that it omits the first and last point, if you want them to be included this can be done easily.
If you only want the deep minima in your graph then you can filter out all the small minima at the end (or during the original list comprehension by adding a condition):
def approx(a,b,tol):
if abs(a-b)<tol: return True
return False
minima_indices_filtered=[i for i in minima_indices if not approx(y_list[i],y_0,tol)]
y_0 is the value of the flat line at the top of your picture, and tol is how deep a minima has to be before it registers as being a minima.

Related

Is there a more efficient an robust way to create a minimum proximity algorithm for a distance matrix?

I am trying to make an algorithm that propagates from point to point in a distance matrix using the smallest distance in the proximity. The code has two conditions: the minimum distance must be no less than 0 and each point must be visited once and return to the starting position.
This is my code in its entirety:
def totalDistance(aList):
path = []
for j in range(0,len(aList)):
k=j
order = []
for l in range(0,len(aList)):
order.append(k)
initval= min(x for x in aList[k] if x > 0 )
k = aList[k].index(initval)
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
path.append(order)
return path
The code is meant to return the indexes of the points in within the closes proximity of the evaluated point.
aList = [[0,3,4,6],[3,0,7,3],[4,7,0,9],[6,3,9,0]] and represents the distance matrix.
When running the code, I get the following error:
initval= min(x for x in aList[k] if x > 0 )
ValueError: min() arg is an empty sequence
I presume that when I make the columns in my distance matrix zero with the following function:
for s in range(0,len(aList)):
for t in range(0,len(aList[s])):
aList[s][k] = 0
the min() function is unable to find a value with the given conditions. Is there a better way to format my code such that this does not occur or a better approach to this problem all together?

One technique and a pointer on the rest that you say is working...
For preventing re-visiting / backtracking. One of the common design patterns for this is to keep a separate data structure to "mark" the places you've been. Because your points are numerically indexed, you could use a list of booleans, but I think it is much easier to just keep a set of the places you've been. Something like this...
visited = set() # places already seen
# If I decide to visit point/index "3"...
visited.add(3)
Not really a great practice to modify your input data as you are doing, and especially so if you are looping over it, which you are...leads to headaches.
So then... Your current error is occurring because when you screen the rows for x>0 you eventually get an empty list because you are changing values and then min() chokes. So part of above can fix that, and you don't need to zero-ize, just mark them.
Then, the obvious question...how to use the marks? You can just use it as a part of your search. And it can work well with the enumerate command which can return index values and the value by enumeration.
Try something like this, which will make a list of "eligible" tuples with the distance and index location.
pts_to_consider = [(dist, idx) for idx, dist in enumerate(aList[k])
if dist > 0
and idx not in visited]
There are other ways to do this with numpy and other things, but this is a reasonable approach and close to what you have in code now. Comment back if stuck. I don't want to give away the whole farm because this is probably H/W. Perhaps you can use some of the hints here.

Numpy notation to replace an enumerate(zip(....))

I'm starting to use numpy. I get the slice notations and element-wise computations, but I can't understand this:
for i, (I,J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[int(np.floor(I/self.bin_size))][int(np.floor(J/self.bin_size))] += 1
Variables:
data_list contains two np.array().flatten() images (eventually more)
joint_hist[] is the joint histogram of those two images, it's displayed later with plt.imshow()
bin_size is the number of slots in the histogram
I can't understand why the coordinate in the final histogram is I,J. So it's not just that the value at a position in joint_hist[] is the result of some slicing/element-wise computation. I need to take the result of that computation and use THAT as the indices in joint_hist...
EDIT:
I indeed do not use the i in the loop actually - it's a leftover from previous iterations and I simply hadn't noticed I didn't need it anymore
I do want to remain in control of the bin sizes & the details of how this is done, so not particularly looking to use histogramm2D. I will later be using that for further image processing, so I'd rather have the flexibility to adapt my approach than have to figure out if/how to do particular things with built-in functions.

You can indeed gussy up that for loop using some numpy notation. Assuming you don't actually need i (since it isn't used anywhere):
for I,J in (data_list.T // self.bin_size).astype(int):
joint_hist[I, J] += 1
Explanation
data_list.T flips data_list on its side. Each row of data_list.T will contain the data for the pixels at a particular coordinate.
data_list.T // self.bin_size will produce the same result as np.floor(I/self.bin_size), only it will operate on all of the pixels at once, instead of one at a time.
.astype(int) does the same thing as int(...), but again operates on the entire array instead of a single element.
When you iterate over a 2D array with a for loop, the rows are returned one at a time. Thus, the for I,J in arr syntax will give you back one pair of pixels at a time, just like your zip statement did originally.
Alternative
You could also just use histogramdd to calculate joint_hist, in place of your for loop. For your application it would look like:
import numpy as np
joint_hist,edges = np.histogramdd(data_list.T)
This would have different bins than the ones you specified above, though (numpy would determine them automatically).

If I understand, your goal is to make an histogram or correlated values in your images? Well, to achieve the right bin index, the computation that you used is not valid. Instead of np.floor(I/self.bin_size), use np.floor(I/(I_max/bin_size)).astype(int). You want to divide I and J by their respective resolution. The result that you will get is a diagonal matrix for joint_hist if both data_list[0] and data_list[1] are the same flattened image.
So all put together:
I_max = data_list[0].max()+1
J_max = data_list[1].max()+1
joint_hist = np.zeros((I_max, J_max))
bin_size = 256
for i, (I, J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[np.floor(I / (I_max / bin_size)).astype(int), np.floor(J / (J_max / bin_size)).astype(int)] += 1

Appending line points with same slope to Python dict

I am trying to write a function where I can spit out all the points in the same line. I am calculating that by the fact, that the slope between two pairs of points must be same for that.
I have iterated through input file to get a list of points and calculated slope. My next step would be to put them on a HashMap (or Dict in Python), with the key being the slope and update it with points and slope. If slope for those two numbers is already present, add the points to same entry and remove any duplicates.
I was able to extract input, calculate slope and put them on in a hashmap. However, putting them on hashmap is a bit challenging for me as I am trying to use Java-like syntax which I am familiar with.
Can someone help me with updating the hashmap ensuring no dups are inserted?
here is what I have done so far:
slopeMap = {}
for x in range (0, len(arr)):
for y in range (x+1, len(arr)):
slopeForPoints = (slope(arr[x][0], arr[y][0], arr[x][1], arr[y][1]))
if slopeMap.has_key(slopeForPoints) == False:
slopeMap[slopeForPoints].append()
"slopeForPoints" in slopeMap
slopeMap["slopeForPoints"] =
a.setdefault("somekey",[]).append("bob")
print slopeForPoints
I just need help with the above function. Slope and iterate function I was able to get working.
Sample slope values (Key- HashMap)
0.0
1.0
0.0
0.9
Sample point values (Value - HashMap)
0.0,0.0
1.1,1.1
3.5,4.5
2.2,2.2

As mentioned by Mad Physicist, you need to calculate the more than just the slope to identify unique lines, as parallel lines would have the same slope but not necessarily be the same line.
There are a few options for this. One such option is to make the keys of your dictionary tuples, such as (slope, intercept). Then to make sure the points are unique, you could make the values for your dictionary sets of tuples.
The idea would look something like this:
slope, intercept = slope_intercept(point1, point2) #Each point is (point_x, point_y)
#Need to write the slope_intercept function
if (slope, intercept) not in slopeMap:
slopeMap[(slope,intercept)] = set() #Could be done with a defaultDict instead
slopeMap[(slope,intercept)].add(point1))
slopeMap[(slope,intercept)].add(point2))
Note, it's more Pythonic to say
if slopeForPoints not in slopeMap:

Change next list element during iteration?

Imagine you have a list of points in the 2D-space. I am trying to find symmetric points.
For doing that I iterate over my list of points and apply symmetry operations. So suppose I apply one of these operations to the first point and after this operation it is equal to other point in the list. These 2 points are symmetric.
So what I want is to erase this other point from the list that I am iterating so in this way my iterating variable say "i" won't take this value. Because I already know that it is symmetric with the first point.
I have seen similar Posts but they remove a value in the list that they have already taken. What I want is to remove subsequent values.

Whatever symmetric points turn out to be True add them to a set, since set maintains unique elements and look up is O(1) you can use if point not in set condition.
if point not in s:
#test for symmetry
if symmetric:
s.add(point)

In general it is a bad idea to remove values from a list you are iterating over. There are, however, another ways to skip the symmetric points. For example, you can check for each point if you have seen a symmetric one before:
for i, point in enumerate(points):
if symmetric(point) not in points[:i]:
# Do whatever you want to do
Here symmetric produces a point according to your symmetry operation. If your symmetry operation connects more that two points you can do
for i, point in enumerate(points):
for sympoint in symmetric(point):
if sympoint in points[:i]:
break
else:
# Do whatever you want to do

Plotting two objects using a 4-item list

I have this simulator (gravitation) I've been working on, and I've dissected the equations, math, etc. and it's totally legitimate. However, when I animate the thing I get weird behavior. I'd rather not bore everyone with the entire script because it's sorta lengthy, but the method I'm calling in line.set under the animate(i) function returns a list of four values, which are the positions of my two particles in Cartesian (x,y) coordinates. For example my list looks like:
[1.2, 3.2, 4.5, 5.1]
where the first index is the x-position of the first particle, the second index is the y-position and likewise for the the last two elements corresponding to the second particle (indices 2 and 3).
My question is whether the line.set_data(force.updatePosition(dt)) should be working the way I think it does, i.e. plotting the first particle with indices 0 and 1 and particle two with indices 2 and 3, or am I missing the point? The plotting works, the particles show up, but they get weird, non-sensical movement.
If it's completely necessary here is the script in its entirety...again it's long-ish that's why I didn't post it directly. Also, it's pretty messy as I'm still fighting with it and haven't cleaned it up yet.
Tl;DR Should line.set_data() be able to plot two separate objects if it is fed a list with 4 items?
def init():
line.set_data([], [])
return line,
def animate(i):
line.set_data(force.updatePosition(dt))
return line,

The docs say:
Definition: l.set_data(self, *args)
Docstring:
Set the x and y data
ACCEPTS: 2D array (rows are x, y) or two 1D arrays
So I imagine you want to give it two lists:
line.set_data([x1, x2], [y1, y2])
But it seems that force.updatePosition already returns a list of two lists([pos1]+[pos2]), so you can maybe try:
line.set_data(np.transpose(force.updatePosition(dt)))
My opinion is you might be better off keeping all this info in arrays and remove half the lines of your code, since you write every line two or four times for each element.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.