Delete points not within land but with an offset

Delete points not within land but with an offset - python

I have a Android App that uses GPS and reports it to the backend. For some reason (mock location or low gps accuracy I guess) a lot of measurements were saved with coordinates beyond lands.
I'm writing a python pandas/geopandas short code to filter out those but it seems not to be that trivial.
My initial idea was to join registered GPS points with hi-res (10m) land shapes.
#
# df initial dataframe with points
geometry=[Point(xy) for xy in zip(df.longitude, df.latitude)]
crs = {'init': 'epsg:4326'}
geodf = gp.GeoDataFrame(df[['longitude','latitude']],geometry = geometry, crs=crs)
world = gp.read_file("../GeoPandas/natural-earth-vector/10m_physical/ne_10m_land.shp")
gpd_joined = gp.sjoin(geodf,world, how='inner', op='intersects', lsuffix='left', rsuffix='right')
However there are may false-positives (shape file inaccuracy?) on the coastline. I'd like to keep those samples.
(blue -- land polygon, markers - points marked for deletion)
Generally speaking my idea is to clean-up the set by deleting most obvious e.g in the middle of the ocean keeping a "border buffer" around the coastline.
Unfortunately I have no idea how to create such a border/buffer. Simple expanding polygon with scale function scales up polygon starting from its most center point. What I'd love to achieve is to expand it by desired size uniformly.
I've tried then using the "Oceans" shape and down-scale it
gdfOceansSS.geometry.scale(xfact=0.9, yfact=0.9, zfact=1.0, origin=(0,0,0))
but without success.
Any tips appreciated!

OMG I've just found-out that simple gp.buffer will do the work:
gdfWorldSS.geometry = world.geometry.buffer(1)

Related

OpenCV recoverPose camera coordinate system

I'm estimating the translation and rotation of a single camera using the following code.
E, mask = cv2.findEssentialMat(k1, k2,
focal = SCALE_FACTOR * 2868
pp = (1920/2 * SCALE_FACTOR, 1080/2 * SCALE_FACTOR),
method = cv2.RANSAC,
prob = 0.999,
threshold = 1.0)
points, R, t, mask = cv2.recoverPose(E, k1, k2)
where k1 and k2 are my matching set of key points, which are Nx2 matrices where the first column is the x-coordinates and the second column is y-coordinates.
I collect all the translations over several frames and generate a path that the camera traveled like this.
def generate_path(rotations, translations):
path = []
current_point = np.array([0, 0, 0])
for R, t in zip(rotations, translations):
path.append(current_point)
# don't care about rotation of a single point
current_point = current_point + t.reshape((3,)
return np.array(path)
So, I have a few issues with this.
The OpenCV camera coordinate system suggests that if I want to view the 2D "top down" view of the camera's path, I should plot the translations along the X-Z plane.
plt.plot(path[:,0], path[:,2])
This is completely wrong.
However, if I write this instead
plt.plot(path[:,0], path[:,1])
I get the following (after doing some averaging)
This path is basically perfect.
So, perhaps I am misunderstanding the coordinate system convention used by cv2.recoverPose? Why should the "birds eye view" of the camera path be along the XY plane and not the XZ plane?
Another, perhaps unrelated issue is that the reported Z-translation appears to decrease linearly, which doesn't really make sense.
I'm pretty sure there's a bug in my code since these issues appear systematic - but I wanted to make sure my understanding of the coordinate system was correct so I can restrict the search space for debugging.

At the very beginning, actually, your method is not producing a real path. The translation t produced by recoverPose() is always a unit vector. Thus, in your 'path', every frame is moving exactly 1 'meter' from the previous frame. The correct method would be, 1) initialize:(featureMatch, findEssentialMatrix, recoverPose), then 2) track:(triangluate, featureMatch, solvePnP). If you would like to dig deeper, finding tutorials on Monocular Visual SLAM would help.
Secondly, you might have messed up with the camera coordinate system and world coordinate system. If you want to plot the trajectory, you would use the world coordinate system rather than camera coordinate system. Besides, the results of recoverPose() are also in world coordinate system. And the world coordinate system is: x-axis pointing to right, y-axis pointing forward, z-axix pointing up.Thus, when you would like to plot the 'bird view', it is correct that you should plot along the X-Y plane.

Interpolating Scattered Data from a Volume that has Empty Space

I have 3d data produced from mesh points. The structure that was meshed is complex enough that interpolation using griddata is lacking. Specifically, there are regions without data points which are being given values by griddata that are not the fill_value. I need these hollow regions to have the value of 0.0, which I set fill_value to.
A simplified version of this is illustrated below:
The area occupied by the cylinder has no data points but the rest of the cube volume does. There will be data points from interpolation inside the cylinder but I need them to be zero.
Below is a slice parallel to the xy plane of the actual interpolated data with a black oval that approximates the edge 'cylinder'. The red an blue 'bleed' in to the void after interpolation. The fill value of 0.0 can be seen in the upper left corner:
Any ideas on how I can achieve the goal of setting those values to 0.0? Note that the 'cylinder' is not of constant shape.
I thought about going z layer by z layer and finding a polygon that gives the cylinder shape and then setting points inside the polygon to zero.
I also thought about partitioning the volume so a portion of the cylinder ends up in corners of the partion (for each z layer) and hoping that the interpolator would not try to extrapolate into the void region.
The first option seems better, but I would like to know if Python provides some sort of functionality which would work better.
EDIT: Here are some actual points from the data set:
The z scale is much smaller than x or y. You can see that the regions I'm interested in are pretty well defined. But, again, how do I identify them for the purposes of setting grid points to 0.0?

Using pcolormesh for plotting an orbit data

I am trying to map a dataset with associated latitude and longitude. The details of the data I am using are given below:
Variable Type Data/Info
-------------------------------
lat ndarray 1826x960, type `float64`
lon ndarray 1826x960, type `float64`
data ndarray 1826x960, type `float64`
I have created then a basemap:
m = Basemap(projection='cyl', llcrnrlon=-180, urcrnrlon=180, llcrnrlat=-40, urcrnrlat=40, resolution='c')
Now, on the basemap created, I'd plot the above mentioned dataset using pcolormesh:
m.drawcoastlines()
m.drawcountries
x,y = m(lon,lat)
m.pcolormesh(x,y,data)
m.colorbar()
plt.show()
This gives following figure:
Temp Brightness plot
But if I perform similar plot on a dataset (size 2691x960, same goes to lon and lat) covering whole londitude stretch(-180 to 180), I get a 'strange bar':
strange bar
I am pretty sure that the strange bar occurs due to the overlapping of dataset. The same plot has been performed in matlab and it works pretty fine.
Please tell me what the problem is, what can be done to remove the bar, what are the other methods of plotting this kind of data in python.

I think that you are running into a problem that I ran into a little bit ago. The problem here is that, when basemap tries to create the polygons, it uses an interpolation method that does not appear to handle the prime meridian correctly. Pixels that actually cross the prime maridian get interpolated into a polygon that extends around the globe.
The solution that I have used is to split the file into two masked arrays (or just mask the original array two different ways at different times), one with the eastern hemisphere masked, and one with the western hemisphere masked, then map them both to the same axes object.
edit: Another solution may be to have your longitude bounds go from -179.99 to 179.99 or something similar.

I haven't worked with anything to give me this problem, but it looks like a solution to a similar sounding problem was offered here using the mpl_toolkit.basemap.addcyclic method.
From the docs:
arrout, lonsout = addcyclic(arrin, lonsin) adds cyclic (wraparound) point in longitude to arrin and lonsin, assumes longitude is the right-most dimension of arrin.

How to correlate two time series with gaps and different time bases?

I have two time series of 3D accelerometer data that have different time bases (clocks started at different times, with some very slight creep during the sampling time), as well as containing many gaps of different size (due to delays associated with writing to separate flash devices).
The accelerometers I'm using are the inexpensive GCDC X250-2. I'm running the accelerometers at their highest gain, so the data has a significant noise floor.
The time series each have about 2 million data points (over an hour at 512 samples/sec), and contain about 500 events of interest, where a typical event spans 100-150 samples (200-300 ms each). Many of these events are affected by data outages during flash writes.
So, the data isn't pristine, and isn't even very pretty. But my eyeball inspection shows it clearly contains the information I'm interested in. (I can post plots, if needed.)
The accelerometers are in similar environments but are only moderately coupled, meaning that I can tell by eye which events match from each accelerometer, but I have been unsuccessful so far doing so in software. Due to physical limitations, the devices are also mounted in different orientations, where the axes don't match, but they are as close to orthogonal as I could make them. So, for example, for 3-axis accelerometers A & B, +Ax maps to -By (up-down), +Az maps to -Bx (left-right), and +Ay maps to -Bz (front-back).
My initial goal is to correlate shock events on the vertical axis, though I would eventually like to a) automatically discover the axis mapping, b) correlate activity on the mapped aces, and c) extract behavior differences between the two accelerometers (such as twisting or flexing).
The nature of the times series data makes Python's numpy.correlate() unusable. I've also looked at R's Zoo package, but have made no headway with it. I've looked to different fields of signal analysis for help, but I've made no progress.
Anyone have any clues for what I can do, or approaches I should research?
Update 28 Feb 2011: Added some plots here showing examples of the data.

My interpretation of your question: Given two very long, noisy time series, find a shift of one that matches large 'bumps' in one signal to large bumps in the other signal.
My suggestion: interpolate the data so it's uniformly spaced, rectify and smooth the data (assuming the phase of the fast oscillations is uninteresting), and do a one-point-at-a-time cross correlation (assuming a small shift will line up the data).
import numpy
from scipy.ndimage import gaussian_filter
"""
sig1 and sig 2 are assumed to be large, 1D numpy arrays
sig1 is sampled at times t1, sig2 is sampled at times t2
t_start, t_end, is your desired sampling interval
t_len is your desired number of measurements
"""
t = numpy.linspace(t_start, t_end, t_len)
sig1 = numpy.interp(t, t1, sig1)
sig2 = numpy.interp(t, t2, sig2)
#Now sig1 and sig2 are sampled at the same points.
"""
Rectify and smooth, so 'peaks' will stand out.
This makes big assumptions about your data;
these assumptions seem true-ish based on your plots.
"""
sigma = 10 #Tune this parameter to get the right smoothing
sig1, sig2 = abs(sig1), abs(sig2)
sig1, sig2 = gaussian_filter(sig1, sigma), gaussian_filter(sig2, sigma)
"""
Now sig1 and sig2 should look smoothly varying, with humps at each 'event'.
Hopefully we can search a small range of shifts to find the maximum of the
cross-correlation. This assumes your data are *nearly* lined up already.
"""
max_xc = 0
best_shift = 0
for shift in range(-10, 10): #Tune this search range
xc = (numpy.roll(sig1, shift) * sig2).sum()
if xc > max_xc:
max_xc = xc
best_shift = shift
print 'Best shift:', best_shift
"""
If best_shift is at the edges of your search range,
you should expand the search range.
"""

If the data contains gaps of unknown sizes that are different in each time series, then I would give up on trying to correlate entire sequences, and instead try cross correlating pairs of short windows on each time series, say overlapping windows twice the length of a typical event (300 samples long). Find potential high cross correlation matches across all possibilities, and then impose a sequential ordering constraint on the potential matches to get sequences of matched windows.
From there you have smaller problems that are easier to analyze.

This isn't a technical answer, but it might help you come up with one:
Convert the plot to an image, and stick it into a decent image program like gimp or photoshop
break the plots into discrete images whenever there's a gap
put the first series of plots in a horizontal line
put the second series in a horizontal line right underneath it
visually identify the first correlated event
if the two events are not lined up vertically:
select whichever instance is further to the left and everything to the right of it on that row
drag those things to the right until they line up
This is pretty much how an audio editor works, so you if you converted it into a simple audio format like an uncompressed WAV file, you could manipulate it directly in something like Audacity. (It'll sound horrible, of course, but you'll be able to move the data plots around pretty easily.)
Actually, audacity has a scripting language called nyquist, too, so if you don't need the program to detect the correlations (or you're at least willing to defer that step for the time being) you could probably use some combination of audacity's markers and nyquist to automate the alignment and export the clean data in your format of choice once you tag the correlation points.

My guess is, you'll have to manually build an offset table that aligns the "matches" between the series. Below is an example of a way to get those matches. The idea is to shift the data left-right until it lines up and then adjust the scale until it "matches". Give it a try.
library(rpanel)
#Generate the x1 and x2 data
n1 <- rnorm(500)
n2 <- rnorm(200)
x1 <- c(n1, rep(0,100), n2, rep(0,150))
x2 <- c(rep(0,50), 2*n1, rep(0,150), 3*n2, rep(0,50))
#Build the panel function that will draw/update the graph
lvm.draw <- function(panel) {
plot(x=(1:length(panel$dat3))+panel$off, y=panel$dat3, ylim=panel$dat1, xlab="", ylab="y", main=paste("Alignment Graph Offset = ", panel$off, " Scale = ", panel$sca, sep=""), typ="l")
lines(x=1:length(panel$dat3), y=panel$sca*panel$dat4, col="red")
grid()
panel
}
#Build the panel
xlimdat <- c(1, length(x1))
ylimdat <- c(-5, 5)
panel <- rp.control(title = "Eye-Ball-It", dat1=ylimdat, dat2=xlimdat, dat3=x1, dat4=x2, off=100, sca=1.0, size=c(300, 160))
rp.slider(panel, var=off, from=-500, to=500, action=lvm.draw, title="Offset", pos=c(5, 5, 290, 70), showvalue=TRUE)
rp.slider(panel, var=sca, from=0, to=2, action=lvm.draw, title="Scale", pos=c(5, 70, 290, 90), showvalue=TRUE)

It sounds like you want to minimize the function (Ax'+By) + (Az'+Bx) + (Ay'+Bz) for a pair of values: Namely, the time-offset: t0 and a time scale factor: tr. where Ax' = tr*(Ax + t0), etc..
I would look into SciPy's bivariate optimize functions. And I would use a mask or temporarily zero the data (both Ax' and By for example) over the "gaps" (assuming the gaps can be programmatically determined).
To make the process more efficient, start with a coarse sampling of A and B, but set the precision in fmin (or whatever optimizer you've selected) that is commensurate with your sampling. Then proceed with progressively finer-sampled windows of the full dataset until your windows are narrow and are not down-sampled.
Edit - matching axes
Regarding the issue of trying to identify which axis is co-linear with a given axis, and not knowing at thing about the characteristics of your data, i can point towards a similar question. Look into pHash or any of the other methods outlined in this post to help identify similar waveforms.

Estimating the boundary of arbitrarily distributed data

I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it.
Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch.
My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists.
1st attempt http://astro.dur.ac.uk/~dmurphy/data_limits.png
OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg:
from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx
y=ymin+dy, do 1
do 1-2, but now sample in y
An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments.
Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary?
A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon.
The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now?
Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really!
~~~~~~~~~~~~~~~~~~~~~~~~~
OK, here's attempt #2 using Mark's idea of convex hulls:
alt text http://astro.dur.ac.uk/~dmurphy/data_limitsv2.png
For this I used qconvex from the qhull package, getting it to return the extreme vertices. For those interested:
cat [data] | qconvex Fx > out
The sampling of the perimeter seems quite low, and although I haven't played much with the settings, I'm not convinced I can improve the fidelity.

I think what you are looking for is the Convex Hull of the data That will give a set of points that if connected will mean that all your points are on or inside the connected points

I may have mixed something, but what's the motivation for simply not determining the maximum and minimum x and y level? Unless you have an enormous amount of data you could simply iterate through your points determining minimum and maximum levels fairly quickly.
This isn't the most efficient example, but if your data set is small this won't be particularly slow:
import random
data = [(random.randint(-100, 100), random.randint(-100, 100)) for i in range(1000)]
x_min = min([point[0] for point in data])
x_max = max([point[0] for point in data])
y_min = min([point[1] for point in data])
y_max = max([point[1] for point in data])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.