Implementing collision detection on a curve in pygame

Implementing collision detection on a curve in pygame - python

I have wrote a class in python that will randomly generate a line with a curve at the end. I've added movement to the class using an two variable: xChange and yChange. I've tried to add collision detection to the curve by calculating a y value (testY) which I got by rearranging the equation of an ellipse ((x-h)^2/a^2 + (y-k)^2/b^2 = 1).
if playerC[0] >= self.x1 and playerC[0] <= self.x4:
#Tests if the player coords are the same as the curves Y
testY = self.k + (self.b*math.sqrt(self.a**2-self.h**2+2*self.h*playerC[0]-playerC[0]**2))/self.a
pygame.draw.line(gameDisplay, WHITE, [0, testY ], [1000, testY])
playerC[0] is the fixed x-coordinate I need collision detection on.
I've tried using the same code on another project and it worked fine, however using it in this format seems to break it.
self.h and self.k are the only variables which will change.

This answer is based on the idea that the curve is some randomly curving line-section.
I would first calculate a bounding box for the curve, using this to initially perform a fast and efficient does-it-collide-at-all test. If the player does not collide with a bounding box, no further tests are needed.
Now it's been determined that playerC is in the vicinity of the curve, the code can do further tests. Depending on the size / complexity / shape of your curve, it may be efficient to split the line into N sub-lines (say 8?), and then bounding-box test those sections against your player. If high accuracy is needed, then further test against the pixels (or further sub-sub-lines) of the curve.
This type of splitting and testing is often implemented with a quadtree data structure. Using a quadtree mimics the process above. It quickly finds the relevant part of the collision test, so that means the code is not spending a huge amount of time processing thousands of points.
Failing all this, generate the list of points for the line, and test these (plus th e movement offset) against the player's bounding box.

Related

Finding a vector direction that deviates approximately equally by e.g. 5 degree from all other vector directions in a set

I have a set of approximately 10,000 vectors max (random directions) in 3d space and I'm looking for a new direction v_dev (vector) which deviates from all other directions in the set by e.g. a minimum of 5 degrees. My naive initial try is the following, which has of course bad runtime complexity but succeeds for some cases.
#!/usr/bin/env python
import numpy as np
numVecs = 10000
vecs = np.random.rand(numVecs, 3)
randVec = np.random.rand(1, 3)
notFound=True
foundVec=randVec
below=False
iter = 1
for vec in vecs:
angle = np.rad2deg(np.arccos(np.vdot(vec, foundVec)/(np.linalg.norm(vec) * np.linalg.norm(foundVec))))
print("angle: %f\n" % angle)
while notFound:
for vec in vecs:
angle = np.rad2deg(np.arccos(np.vdot(vec, randVec)/(np.linalg.norm(vec) * np.linalg.norm(randVec))))
if angle < 5:
below=True
if below:
randVec = np.random.rand(1, 3)
else:
notFound=False
print("iteration no. %i" % iter)
iter = iter + 1
Any hints how to approach this problem (language agnostic) would be appreciate.

Consider the vectors in a spherical coordinate system (u,w,r), where r is always 1 because vector length doesn't matter here. Any vector can be expressed as (u,w) and the "deadzone" around each vector x, in which the target vector t cannot fall, can be expressed as dist((u_x, w_x, 1), (u_x-u_t, w_x-w_t, 1)) < 5°. However calculating this distance can be a bit tricky, so converting back into cartesian coordinates might be easier. These deadzones are circular on the spherical shell around the origin and you're looking for a t that doesn't hit any on them.
For any fixed u_t you can iterate over all x and using the distance function can find the start and end point of a range of w_t, that are blocked because they fall into the deadzone of the vector x. The union of all 10000 ranges build the possible values of w_t for that given u_t. The same can be done for any fixed w_t, looking for a u_t.
Now comes the part that I'm not entirely sure of: Given that you have two unknows u_t and w_t and 20000 knowns, the system is just a tad overdetermined and if there's a solution, it should be possible to find it.
My suggestion: Set u_t fixed to a random value and check which w_t are possible. If you find a non-empty range, great, you're done. If all w_t are blocked, select a different u_t and try again. Now, selecting u_t at random will work eventually, yet a smarter iteration should be possible. Maybe u_t(n) = u_t(n-1)*phi % 360°, where phi is the golden ratio. That way the u_t never repeat and will cover the whole space with finer and finer granularity instead of starting from one end and going slowly to the other.
Edit: You might also have more luck on the mathematics stackexchange since this isn't so much a code question as it is a mathematics question. For example I'm not sure what I wrote is all that rigorous, so I don't even know it works.

One way would be two build a 2d manifold (area on the sphere) of forbidden areas. You start by adding a point, then, the forbidden area is a circle on the sphere surface.
While true, pick a point on the boundary of the area. If this is not close (within 5 degrees) to any other vector, then, you're done, return it. If not, you just found a new circle of forbidden area. Add it to your manifold of forbidden area. You'll need to chop the circle in line or arc segments and build the boundary as a list.
If the set of vector has no solution, you boundary will collapse to an empty point. Then you return failure.
It's not the easiest approach, and you'll have to deal with the boundaries of a complex shape over a sphere. But it's guaranteed to work and should have reasonable complexity.

On simulation of fixed-end robotic arms

This a draft of a 3D model I’m working with, and I would like to simulate its behaviour using python language. I have been researching on the best implementation for this simulation, but I found nothing that could fit real motion. I have tried analytical solving and failed because of uncertainity of certain parameters (certain errors for arm length) when those were measured.
I want to simulate the motion produced by a revolute joint and transfered to a system which is similar to the one depicted on the scheme.
At a certain time, the system might use the revolute joint and then turn to the following status.
Both status for the system are depicted on the next scheme.
An easy simplification with DH parameters would be:
The important thing is how to calculate the position and the angles of both non-controlled joints so that receptor joint angle (fixed point) can be calculated.
It is not only an inverse kinematics problem. It is necessary to consider the motion restrictions too. The motion must be determined by the revolute joint angle, the lenght of the links and the fixed point position and length.
The red circle in the next image depicts the possible positions for the second non-controlled point.
How would you simulate this motion?

There are one problematic position,
where intersections of two circles (described below)
has one point.
In this situation (we suppose it is planar situation (gravity is perpendicular
to all arm) and static situation) there isn't any force, which move with second non-controlled joint.
In dynamic we choose another solution for next step.
When intersection isn't exist,
that situation dosn't exist
and revolute joint cannot move
to this position.
We obtain (trivialy) motion restrictions when we calculate
all position and determine position where doesn't exist intersection.
Do you obtain end position of non-fixed point directly?
Older ansewer:
Simulate motion:
Calculate position of non controled points for all time between
start position and end position with step delta_t.
Draw step by step each calculated position (for example via Pygame).
Calculate:
First compute position of first non-controlled point (higher)
x_2 = x_1 + l_12 cos(Theta_1),
y_2 = y_1 + l_12 sin(Theta_2),
where X_1(x_1, y_1) is position of revolute point,
X_2(x_2, y_2) is position of first non-controlled point
and l_12 is length between X_1 and X_2
Compute intersection of two circle k_1 and k_2,
where k_1(first non-controlled point, l_23) and k_2(receptor joint, l_34),
where k(center of circle, radius of circle).
Step 2 has two solution.
We choose one of then.
To simulate motion, we must choose
"same solution".
Compute angle from two points:
alpha = math.atan2((y_2-y_1)/(x_2-x_1))

How to sort my paws?

In my previous question I got an excellent answer that helped me detect where a paw hit a pressure plate, but now I'm struggling to link these results to their corresponding paws:
I manually annotated the paws (RF=right front, RH= right hind, LF=left front, LH=left hind).
As you can see there's clearly a repeating pattern and it comes back in almost every measurement. Here's a link to a presentation of 6 trials that were manually annotated.
My initial thought was to use heuristics to do the sorting, like:
There's a ~60-40% ratio in weight bearing between the front and hind paws;
The hind paws are generally smaller in surface;
The paws are (often) spatially divided in left and right.
However, I’m a bit skeptical about my heuristics, as they would fail on me as soon as I encounter a variation I hadn’t thought off. They also won’t be able to cope with measurements from lame dogs, whom probably have rules of their own.
Furthermore, the annotation suggested by Joe sometimes get's messed up and doesn't take into account what the paw actually looks like.
Based on the answers I received on my question about peak detection within the paw, I’m hoping there are more advanced solutions to sort the paws. Especially because the pressure distribution and the progression thereof are different for each separate paw, almost like a fingerprint. I hope there's a method that can use this to cluster my paws, rather than just sorting them in order of occurrence.
So I'm looking for a better way to sort the results with their corresponding paw.
For anyone up to the challenge, I have pickled a dictionary with all the sliced arrays that contain the pressure data of each paw (bundled by measurement) and the slice that describes their location (location on the plate and in time).
To clarfiy: walk_sliced_data is a dictionary that contains ['ser_3', 'ser_2', 'sel_1', 'sel_2', 'ser_1', 'sel_3'], which are the names of the measurements. Each measurement contains another dictionary, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] (example from 'sel_1') which represent the impacts that were extracted.
Also note that 'false' impacts, such as where the paw is partially measured (in space or time) can be ignored. They are only useful because they can help recognizing a pattern, but
won't be analyzed.
And for anyone interested, I’m keeping a blog with all the updates regarding the project!

Alright! I've finally managed to get something working consistently! This problem pulled me in for several days... Fun stuff! Sorry for the length of this answer, but I need to elaborate a bit on some things... (Though I may set a record for the longest non-spam stackoverflow answer ever!)
As a side note, I'm using the full dataset that Ivo provided a link to in his original question. It's a series of rar files (one-per-dog) each containing several different experiment runs stored as ascii arrays. Rather than try to copy-paste stand-alone code examples into this question, here's a bitbucket mercurial repository with full, stand-alone code. You can clone it with
hg clone https://joferkington#bitbucket.org/joferkington/paw-analysis
Overview
There are essentially two ways to approach the problem, as you noted in your question. I'm actually going to use both in different ways.
Use the (temporal and spatial) order of the paw impacts to determine which paw is which.
Try to identify the "pawprint" based purely on its shape.
Basically, the first method works with the dog's paws follow the trapezoidal-like pattern shown in Ivo's question above, but fails whenever the paws don't follow that pattern. It's fairly easy to programatically detect when it doesn't work.
Therefore, we can use the measurements where it did work to build up a training dataset (of ~2000 paw impacts from ~30 different dogs) to recognize which paw is which, and the problem reduces to a supervised classification (With some additional wrinkles... Image recognition is a bit harder than a "normal" supervised classification problem).
Pattern Analysis
To elaborate on the first method, when a dog is walking (not running!) normally (which some of these dogs may not be), we expect paws to impact in the order of: Front Left, Hind Right, Front Right, Hind Left, Front Left, etc. The pattern may start with either the front left or front right paw.
If this were always the case, we could simply sort the impacts by initial contact time and use a modulo 4 to group them by paw.
However, even when everything is "normal", this doesn't work. This is due to the trapezoid-like shape of the pattern. A hind paw spatially falls behind the previous front paw.
Therefore, the hind paw impact after the initial front paw impact often falls off the sensor plate, and isn't recorded. Similarly, the last paw impact is often not the next paw in the sequence, as the paw impact before it occured off the sensor plate and wasn't recorded.
Nonetheless, we can use the shape of the paw impact pattern to determine when this has happened, and whether we've started with a left or right front paw. (I'm actually ignoring problems with the last impact here. It's not too hard to add it, though.)
def group_paws(data_slices, time):
# Sort slices by initial contact time
data_slices.sort(key=lambda s: s[-1].start)
# Get the centroid for each paw impact...
paw_coords = []
for x,y,z in data_slices:
paw_coords.append([(item.stop + item.start) / 2.0 for item in (x,y)])
paw_coords = np.array(paw_coords)
# Make a vector between each sucessive impact...
dx, dy = np.diff(paw_coords, axis=0).T
#-- Group paws -------------------------------------------
paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
paw_number = np.arange(len(paw_coords))
# Did we miss the hind paw impact after the first
# front paw impact? If so, first dx will be positive...
if dx[0] > 0:
paw_number[1:] += 1
# Are we starting with the left or right front paw...
# We assume we're starting with the left, and check dy[0].
# If dy[0] > 0 (i.e. the next paw impacts to the left), then
# it's actually the right front paw, instead of the left.
if dy[0] > 0: # Right front paw impact...
paw_number += 2
# Now we can determine the paw with a simple modulo 4..
paw_codes = paw_number % 4
paw_labels = [paw_code[code] for code in paw_codes]
return paw_labels
In spite of all of this, it frequently doesn't work correctly. Many of the dogs in the full dataset appear to be running, and the paw impacts don't follow the same temporal order as when the dog is walking. (Or perhaps the dog just has severe hip problems...)
Fortunately, we can still programatically detect whether or not the paw impacts follow our expected spatial pattern:
def paw_pattern_problems(paw_labels, dx, dy):
"""Check whether or not the label sequence "paw_labels" conforms to our
expected spatial pattern of paw impacts. "paw_labels" should be a sequence
of the strings: "LH", "RH", "LF", "RF" corresponding to the different paws"""
# Check for problems... (This could be written a _lot_ more cleanly...)
problems = False
last = paw_labels[0]
for paw, dy, dx in zip(paw_labels[1:], dy, dx):
# Going from a left paw to a right, dy should be negative
if last.startswith('L') and paw.startswith('R') and (dy > 0):
problems = True
break
# Going from a right paw to a left, dy should be positive
if last.startswith('R') and paw.startswith('L') and (dy < 0):
problems = True
break
# Going from a front paw to a hind paw, dx should be negative
if last.endswith('F') and paw.endswith('H') and (dx > 0):
problems = True
break
# Going from a hind paw to a front paw, dx should be positive
if last.endswith('H') and paw.endswith('F') and (dx < 0):
problems = True
break
last = paw
return problems
Therefore, even though the simple spatial classification doesn't work all of the time, we can determine when it does work with reasonable confidence.
Training Dataset
From the pattern-based classifications where it worked correctly, we can build up a very large training dataset of correctly classified paws (~2400 paw impacts from 32 different dogs!).
We can now start to look at what an "average" front left, etc, paw looks like.
To do this, we need some sort of "paw metric" that is the same dimensionality for any dog. (In the full dataset, there are both very large and very small dogs!) A paw print from an Irish elkhound will be both much wider and much "heavier" than a paw print from a toy poodle. We need to rescale each paw print so that a) they have the same number of pixels, and b) the pressure values are standardized. To do this, I resampled each paw print onto a 20x20 grid and rescaled the pressure values based on the maximum, mininum, and mean pressure value for the paw impact.
def paw_image(paw):
from scipy.ndimage import map_coordinates
ny, nx = paw.shape
# Trim off any "blank" edges around the paw...
mask = paw > 0.01 * paw.max()
y, x = np.mgrid[:ny, :nx]
ymin, ymax = y[mask].min(), y[mask].max()
xmin, xmax = x[mask].min(), x[mask].max()
# Make a 20x20 grid to resample the paw pressure values onto
numx, numy = 20, 20
xi = np.linspace(xmin, xmax, numx)
yi = np.linspace(ymin, ymax, numy)
xi, yi = np.meshgrid(xi, yi)
# Resample the values onto the 20x20 grid
coords = np.vstack([yi.flatten(), xi.flatten()])
zi = map_coordinates(paw, coords)
zi = zi.reshape((numy, numx))
# Rescale the pressure values
zi -= zi.min()
zi /= zi.max()
zi -= zi.mean() #<- Helps distinguish front from hind paws...
return zi
After all of this, we can finally take a look at what an average left front, hind right, etc paw looks like. Note that this is averaged across >30 dogs of greatly different sizes, and we seem to be getting consistent results!
However, before we do any analysis on these, we need to subtract the mean (the average paw for all legs of all dogs).
Now we can analyize the differences from the mean, which are a bit easier to recognize:
Image-based Paw Recognition
Ok... We finally have a set of patterns that we can begin to try to match the paws against. Each paw can be treated as a 400-dimensional vector (returned by the paw_image function) that can be compared to these four 400-dimensional vectors.
Unfortunately, if we just use a "normal" supervised classification algorithm (i.e. find which of the 4 patterns is closest to a particular paw print using a simple distance), it doesn't work consistently. In fact, it doesn't do much better than random chance on the training dataset.
This is a common problem in image recognition. Due to the high dimensionality of the input data, and the somewhat "fuzzy" nature of images (i.e. adjacent pixels have a high covariance), simply looking at the difference of an image from a template image does not give a very good measure of the similarity of their shapes.
Eigenpaws
To get around this we need to build a set of "eigenpaws" (just like "eigenfaces" in facial recognition), and describe each paw print as a combination of these eigenpaws. This is identical to principal components analysis, and basically provides a way to reduce the dimensionality of our data, so that distance is a good measure of shape.
Because we have more training images than dimensions (2400 vs 400), there's no need to do "fancy" linear algebra for speed. We can work directly with the covariance matrix of the training data set:
def make_eigenpaws(paw_data):
"""Creates a set of eigenpaws based on paw_data.
paw_data is a numdata by numdimensions matrix of all of the observations."""
average_paw = paw_data.mean(axis=0)
paw_data -= average_paw
# Determine the eigenvectors of the covariance matrix of the data
cov = np.cov(paw_data.T)
eigvals, eigvecs = np.linalg.eig(cov)
# Sort the eigenvectors by ascending eigenvalue (largest is last)
eig_idx = np.argsort(eigvals)
sorted_eigvecs = eigvecs[:,eig_idx]
sorted_eigvals = eigvals[:,eig_idx]
# Now choose a cutoff number of eigenvectors to use
# (50 seems to work well, but it's arbirtrary...
num_basis_vecs = 50
basis_vecs = sorted_eigvecs[:,-num_basis_vecs:]
return basis_vecs
These basis_vecs are the "eigenpaws".
To use these, we simply dot (i.e. matrix multiplication) each paw image (as a 400-dimensional vector, rather than a 20x20 image) with the basis vectors. This gives us a 50-dimensional vector (one element per basis vector) that we can use to classify the image. Instead of comparing a 20x20 image to the 20x20 image of each "template" paw, we compare the 50-dimensional, transformed image to each 50-dimensional transformed template paw. This is much less sensitive to small variations in exactly how each toe is positioned, etc, and basically reduces the dimensionality of the problem to just the relevant dimensions.
Eigenpaw-based Paw Classification
Now we can simply use the distance between the 50-dimensional vectors and the "template" vectors for each leg to classify which paw is which:
codebook = np.load('codebook.npy') # Template vectors for each paw
average_paw = np.load('average_paw.npy')
basis_stds = np.load('basis_stds.npy') # Needed to "whiten" the dataset...
basis_vecs = np.load('basis_vecs.npy')
paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
def classify(paw):
paw = paw.flatten()
paw -= average_paw
scores = paw.dot(basis_vecs) / basis_stds
diff = codebook - scores
diff *= diff
diff = np.sqrt(diff.sum(axis=1))
return paw_code[diff.argmin()]
Here are some of the results:
Remaining Problems
There are still some problems, particularly with dogs too small to make a clear pawprint... (It works best with large dogs, as the toes are more clearly seperated at the sensor's resolution.) Also, partial pawprints aren't recognized with this system, while they can be with the trapezoidal-pattern-based system.
However, because the eigenpaw analysis inherently uses a distance metric, we can classify the paws both ways, and fall back to the trapezoidal-pattern-based system when the eigenpaw analysis's smallest distance from the "codebook" is over some threshold. I haven't implemented this yet, though.
Phew... That was long! My hat is off to Ivo for having such a fun question!

Using the information purely based on duration, I think you could apply techniques from modeling kinematics; namely Inverse Kinematics. Combined with orientation, length, duration, and total weight it gives some level of periodicity which, I would hope could be the first step trying to solve your "sorting of paws" problem.
All that data could be used to create a list of bounded polygons (or tuples), which you could use to sort by step size then by paw-ness [index].

Can you have the technician running the test manually enter the first paw (or first two)? The process might be:
Show tech the order of steps image and require them to annotate the first paw.
Label the other paws based on the first paw and allow the tech to make corrections or re-run the test. This allows for lame or 3-legged dogs.

Measuring rectangles at odd angles with a low resolution input matrix (Linear regression classification?)

I'm trying to solve the following problem:
Given an input of, say,
0000000000000000
0011111111110000
0011111111110000
0011111111110000
0000000000000000
0000000111111110
0000000111111110
0000000000000000
I need to find the width and height of all rectangles in the field. The input is actually a single column at a time (think like a scanner moves from left to right) and is continuous for the duration of the program (that is, the scanning column doesn't move, but the rectangles move over it).
In this example, I can 'wait for a rectangle to begin' (that is, watch for zeros changing to 1s) and then watch it end (ones back to zeros) and measure the piece in 'grid units'. This will work fine for the simple case outlined above, but will fail is the rectangle is tilted at an angle, for example:
0000000000000000
0000011000000000
0000111100000000
0001111111000000
0000111111100000
0000011111110000
0000000111100000
0000000011000000
I had originally thought that the following question would apply:
Dynamic programming - Largest square block
but now i'm not so sure.
I have little to no experience with regression or regression testing, but I think that I could represent this as an input of 8 variables.....
Well to be honest i'm not sure how I would do this at all. The sizes that this part of the code extracts need to be fitted against rectangles of known sizes (ie, from a database).
I initially thought I could feed the known data as training exercises and store the positive test results, but I'm really not sure where to go from here.
Thanks for any advice you might have.

Collect the transition points (from a 1 to a 0 or vice-versa) as you're scanning, then figure the length and width either directly from there, or from the convex hull of each object.
If rectangles can overlap, then you'll have bigger issues.

I'd take following steps:
get all columns together in a matrix (this is needed for proper filtering)
now apply a filter (need to google for it a bit) to sharpen edges and corners
create some structure to hold data for next steps (this can have many different solutions, choose your favorite and/or optimal)
scan vertically (column by column) and for each segment of consequent 'ones' found in a column (segment means you have found it's start end end y coordinates) do:
check that this segment overlaps some segment in the previous column
if it does not, consider this a new rect. Create a rect object and assign it's handle to the segment. for the new rect, update it's metrics (this operation takes just the segment's coordinates - x, ymin, ymax, and will be discussed later)
if it does, assume this is the same rect, take the rect's handle, assign this handle to the current segment then get the rect by it's handle and update it's metrics
That's pretty it. After this you will have a pool of rect objects each having four coordinates of its corners. Do some primitive math to approximate rect's width and height.
So where is the magic? Well, it all happens in the update rect metrics routine.
For each rect we have 13 metrics:
min X => ymin1, ymax1
max X => ymin2, ymax2
min Y => xmin1, xmax1
max Y => xmin2, xmax2
average vertical segment length
First of all we have to determine if this rect is properly aligned within our scan grid. To do this we compare values average vertical segment length and max Y - min Y. If they are the same (i'd choose a threshold around 97%, and then tune it for the best results), then we assume the following coordinates for our rect:
(min X, max Y)
(min X, min Y)
(max X, max Y)
(max X, min Y).
In other case out rect is rotated and in this case we take it's coordinates as follows:
(min X, (ymin1+ymax1)/2)
((xmin1+xmax1)/2, min Y)
(max X, (ymin2+ymax2)/2)
((xmin2+xmax2)/2, max Y)

I posed this question to a friend, and he suggested:
When seeing a 1 for the first time, store it as a new shape. Flood fill it to the right, and add those points to the same shape.
Any input pixel that is'nt in a shape now is a new shape. Do the same flood fill.
On the next input column, flood again from the original shape points. Add new pixels to the corresponding shape
If any flood fill does not add any new pixels for two consecutive columns, you have a completed shape. Move on, and try to determine it's dimensions
This then leaves us with getting the dimensions for a shape we isolated (like in example 2).
For this, we thought up:
If the number of leftmost pixels in the shape is below the average number of pixels per column, then the peice is probably rotated. Thus, find the corners by getting the outermost pixels. Use distance formula between all of them. Largest = hypotenuse, others = width or height.
Otherwise, this peice is probably perfectly aligned, so the corners are probably just the topleft most pixel, bottom right most pixel, etc
What do you all think?

Estimating the boundary of arbitrarily distributed data

I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it.
Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch.
My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists.
1st attempt http://astro.dur.ac.uk/~dmurphy/data_limits.png
OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg:
from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx
y=ymin+dy, do 1
do 1-2, but now sample in y
An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments.
Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary?
A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon.
The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now?
Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really!
~~~~~~~~~~~~~~~~~~~~~~~~~
OK, here's attempt #2 using Mark's idea of convex hulls:
alt text http://astro.dur.ac.uk/~dmurphy/data_limitsv2.png
For this I used qconvex from the qhull package, getting it to return the extreme vertices. For those interested:
cat [data] | qconvex Fx > out
The sampling of the perimeter seems quite low, and although I haven't played much with the settings, I'm not convinced I can improve the fidelity.

I think what you are looking for is the Convex Hull of the data That will give a set of points that if connected will mean that all your points are on or inside the connected points

I may have mixed something, but what's the motivation for simply not determining the maximum and minimum x and y level? Unless you have an enormous amount of data you could simply iterate through your points determining minimum and maximum levels fairly quickly.
This isn't the most efficient example, but if your data set is small this won't be particularly slow:
import random
data = [(random.randint(-100, 100), random.randint(-100, 100)) for i in range(1000)]
x_min = min([point[0] for point in data])
x_max = max([point[0] for point in data])
y_min = min([point[1] for point in data])
y_max = max([point[1] for point in data])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.