I'm using a Kalman filter to track the position of a vehicle and for my measurement, I have a GPX file (WGS84 Format) containing information about the latitude, longitude, elevation and the timestamp of each point given by GPS. Using this data, I computed the distance between GPS points (Using Geodesic distance and Vincenty formula) and, since the timestamp information is known, the time difference between the points can be used to calculate the time delta. Since, we now have the distance and the time delta between the points, we can calculate the velocity (= distance between points/time delta) which could then be also used as a measurement input to the Kalman.
However, I have read that this is only the average velocity and not the instantaneous velocity at any given point. In order to obtain the instantaneous velocity, it is suggested that one must take the running average and some implementations directly compute the velocity considering the difference in time between the current time at the point with the first initial point. I'm a bit confused as to which method I need to use to implement this in python.
Firstly, is this method used in my implementation correct to calculate velocity? (I also read about doppler shifts that can be used but sadly, I only collect the GPS data through a running app (Strava) on my iPhone)
How can the instantaneous velocity at every GPS point be calculated from my implementation?( Is the bearing information also necessary?)
What would be the error from this computed velocity? (Since the error in position itself from an iPhone can be about 10 metres, error from distance measurement about 1mm and considering that I want the focus to be on as much accuracy as possible)
Current Implementation
import gpxpy
import pandas as pd
import numpy as np
from geopy.distance import vincenty, geodesic
import matplotlib.pyplot as plt
"Import GPS Data"
with open('my_run_001.gpx') as fh:
gpx_file = gpxpy.parse(fh)
segment = gpx_file.tracks[0].segments[0]
coords = pd.DataFrame([
{'lat': p.latitude,
'lon': p.longitude,
'ele': p.elevation,
} for p in segment.points])
"Compute delta between timestamps"
times = pd.Series([p.time for p in segment.points], name='time')
dt = np.diff(times.values) / np.timedelta64(1, 's')
"Find distance between points using Vincenty and Geodesic methods"
vx = []
for i in range(len(coords.lat)-1):
if(i<=2425):
vincenty_distance = vincenty([coords.lat[i], coords.lon[i]],[coords.lat[i+1], coords.lon[i+1]]).meters
vx.append(vincenty_distance)
print(vx)
vy = []
for i in range(len(coords.lat)-1):
if(i<=2425):
geodesic_distance = geodesic([coords.lat[i], coords.lon[i]],[coords.lat[i+1], coords.lon[i+1]]).meters
vy.append(geodesic_distance)
print(vy)
"Compute and plot velocity"
velocity = vx/dt
time = [i for i in range(len(dt))]
plt.plot(velocity,time)
plt.xlabel('time')
plt.ylabel('velocity')
plt.title('Plot of Velocity vs Time')
plt.show()
Reference for GPX Data:
https://github.com/stevenvandorpe/testdata/blob/master/gps_coordinates/gpx/my_run_001.gpx
interesting topic out here.
If you are planning to use standalone GPS location output for calculating velocity of an object, be ready for some uncertainty of the results. As I am sure you know, there are certain propagation delays in the whole process, so there are several information you need to pay attention on.
1.
Basically, taking the distance and time, and then calculating velocity based on those deltas is right approach, but as you've told, that is the average velocity between two gps measurements, since gps has some propagation delay in its nature.
2.
Like we've told, this kind of calculation gives us average velocity in function of delta time and distance, and by nature, we cant change that. What we can do is affect the frequency of sampling the gps signal, and by that increase or decrease real time accuracy of our system.
SUGGESTION: If you wish a bit more accurate real time velocity data, I suggest you involving gyroscope sensor of your phone and processing it's output. Collecting first delta (average) speed from GPS and then detect gyro changes will be interesting way to continue with your idea.
3.
Lets say you are walking (or superspeed running :)) with your device. At one moment, device is sending request for GPS location but due some issues (perhaps bad satellite connection) you got response with data with 10 seconds delay. For purpose of the example lets consider you are walking on absolutely straight line on absolutely flat part of surface :) After 1 minute from last request received you are sending another request for gps location, and you receive the data which told you you've walked 300m on north from your previous measurement, with 2 secs delay. If you measure it from send request to send another request you would get your speed was 300/70 = 4.28 m/s (quite impressive speed), but what's one of actual possible scenarions:
- You didnt walked 300m, you walked 270 m (gps error)
- Time between two measures (received) is around 62s
- You were even faster with 270/62 = 4.84 m/s
With phone it is tricky you cant measure when did you actually sent request in ether or when in ms you got the response, and those things are quite possible when you are manipulating sensors on hardware-proximity layer. Therefore you will certainly loose some accuracy.
Related
I have data from a number of high frequency data capture devices connected to generators on an electricity grid. These meters collect data in ~1 second "bursts" at ~1.25ms frequency, ie. fast enough to actually see the waveform. See below graphs showing voltage and current for the three phases shown in different colours.
This timeseries has a changing fundamental frequency, ie the frequency of the electricity grid is changing over the length of the timeseries. I want to roll this (messy) waveform data up to summary statistics of frequency and phase angle for each phase, calculated/estimated every 20ms (approx once per cycle).
The simplest way that I can think of would be to just count the gap between the 0 passes (y=0) on each wave and use the offset to calculate phase angle. Is there a neat way to achieve this (ie. a table of interpolated x values for which y=0).
However the above may be quite noisy, and I was wondering if there is a more mathematically elegant way of estimating a changing frequency and phase angle with pandas/scipy etc. I know there are some sophisticated techniques available for periodic functions but I'm not familiar enough with them. Any suggestions would be appreciated :)
Here's a "toy" data set of the first few waves as a pandas Series:
import pandas as pd, datetime as dt
ds_waveform = pd.Series(
index = pd.date_range('2020-08-23 12:35:37.017625', '2020-08-23 12:35:37.142212890', periods=100),
data = [ -9982., -110097., -113600., -91812., -48691., -17532.,
24452., 75533., 103644., 110967., 114652., 92864.,
49697., 18402., -23309., -74481., -103047., -110461.,
-113964., -92130., -49373., -18351., 24042., 75033.,
103644., 111286., 115061., 81628., 61614., 19039.,
-34408., -62428., -103002., -110734., -114237., -92858.,
-49919., -19124., 23542., 74987., 103644., 111877.,
115379., 82720., 62251., 19949., -33953., -62382.,
-102820., -111053., -114555., -81941., -62564., -19579.,
34459., 62706., 103325., 111877., 115698., 83084.,
62888., 20949., -33362., -61791., -102547., -111053.,
-114919., -82805., -62882., -20261., 33777., 62479.,
103189., 112195., 116380., 83630., 63843., 21586.,
-32543., -61427., -102410., -111553., -115374., -83442.,
-63565., -21217., 33276., 62024., 103007., 112468.,
116471., 84631., 64707., 22405., -31952., -61108.,
-101955., -111780., -115647., -84261.])
I was using the Google Maps Distance matrix API in python to calculate distances on bicycle between two points, using latitude and longitude. I was using a loop to calculate almost 300,000 rows of data for a student project (I am studying Data Science with Python). I added a debug line to output the row# and distance every 10,000 rows, but after humming away for a while with no results, I stopped the kernel and changed it to every 1000 rows. With that, after about 5 minutes it finally got to row 1000. After over an hour, it was only on row 70,000. Unbelievable. I stopped execution and later that day got an email from Google saying I had used up my free trial. so not only did it work incredibly slowly, I can't even use it at all anymore for a student project without incurring enormous fees.
So I rewrote the code to use geometry and just calculate "as the crow flies" distance. Not really what I want, but short of any alternatives, that's my only option.
Does anyone know of another (open-source, free) way to calculate distance to get what I want, or how to use the google distance matrix API more efficiently?
thanks,
so here is some more information, as suggested I post a bit more. I am trying to calculate distances between "stations", and am given lat's and long's for about 300K pairs. I was going to set up a function and then apply that function to the dataframe (bear with me, I'm still new at python and dataframes) -- but for now I was using a loop to go through all the pairs. Here is my code:
i = 0
while i < len(trip):
from_coords = str(result.loc[i, 'from_lat']) + " " + str(result.loc[i, 'from_long'])
to_coords = str(result.loc[i, 'to_lat']) + " " + str(result.loc[i, 'to_long'])
# now to get distances!!!
distance = gmaps.distance_matrix([from_coords], #origin lat & long, formatted for gmaps
[to_coords], #destination lat & long, formatted for gmaps
mode='bicycling')['rows'][0]['elements'][0] #mode=bicycling to use streets for cycling
result['distance'] = distance['distance']['value']
# added this bit to see how quickly/slowly the code is running
# ... and btw it's running very slowly. had the debug line at 10000 and changed it to 1000
# ... and i am running on a with i9-9900K with 48GB ram
# ... why so slow?
if i % 1000 == 0:
print(distance['distance']['value'])
i += 1
You could approximate the distance in KM with the haversine distance.
Here I have my distances as lat/long pairs as random_distances with shape (300000, 2) as a numpy array:
import numpy as np
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
random_distances = np.random.random( (300000,2) )
Than we can approximate the distances with
distances = np.zeros( random_distances.shape[0] - 2 )
for idx in range(random_distances.shape[0]-2):
distances[idx] = dist.pairwise(np.radians(random_distances[idx:idx+2]), np.radians(random_distances[idx:idx+2]) )[0][1]
distances *= 6371000/1000 # to get output as KM
distances now contains the distances.
It is 'allright' in speed, but can be improved. We could get rid of the for loop for instance, also 2x2 distances are returned and only 1 is used.
The haversine distance is an good approximation, but not exact which I imagine the API is:
From sklearn:
As the Earth is nearly spherical, the haversine formula provides a good approximation of the distance between two points of the Earth surface, with a less than 1% error on average.
I need calculate the solar zenith angle for approximately 106.000.000 of different coordinates. This coordinates are referrals to the pixels from an image projected at Earth Surface after the image had been taken by camera into the airplane.
I am using the pvlib.solarposition.get_position() to calculate the solar zenith angle. The values returned are being calculated correctly (I compared some results with NOOA website) but, how I need calculate the solar zenith angle for many couple of coordinates, the python is spending many days (about 5 days) to finish the execution of the function.
How I am a beginner in programming, I wonder is there is any way to accelerate the solar zenith angle calculation.
Below found the part of the code implemented which calculate the solar zenith angle:
sol_apar_zen = []
for i in range(size3):
solar_position = np.array(pvl.solarposition.get_solarposition(Data_time_index, lat_long[i][0], lat_long[i][1]))
sol_apar_zen.append(solar_position[0][0])
print(len(sol_apar_zen))
Technically, if you need to compute Solar Zenith Angle quickly for a large list (array), there are more efficient algorithms than the PVLIB's one. For example, the one described by Roberto Grena in 2012 (https://doi.org/10.1016/j.solener.2012.01.024).
I found a suitable implementation here: https://github.com/david-salac/Fast-SZA-and-SAA-computation (you mind need some tweaks, but it's simple to use it, plus it's also implemented for other languages than Python like C/C++ & Go).
Example of how to use it:
from sza_saa_grena import solar_zenith_and_azimuth_angle
# ...
# A random time series:
time_array = pd.date_range("2020/1/1", periods=87_600, freq="10T", tz="UTC")
sza, saa = solar_zenith_and_azimuth_angle(longitude=-0.12435, # London longitude
latitude=51.48728, # London latitude
time_utc=time_array)
That unit-test (in the project's folder) shows that in the normal latitude range, an error is minimal.
Since your coordinates represent a grid, another option would be to calculate the zenith angle for a subset of your coordinates, and the do a 2-d interpolation to obtain the remainder. 1 in 100 in both directions would reduce your calculation time by a factor of 10000.
If you want to fasten up this calculation you can use the numba core (if installed)
location.get_solarposition(
datetimes,
method='nrel_numba'
)
Otherwise you have to implement your own calculation based on vectorized numpy arrays. I know it is possible but I am not allowed to share. You can find the formulation if you search for spencer 1971 solar position
I have read the following sentence:
Figure 3 depicts how the pressure develops during a touch event. It
shows the mean over all button touches from all users. To account for
the different hold times of the touch events, the time axis has been
normalized before averaging the pressure values.
They have measured the touch pressure over touch events and made a plot. I think normalizing the time axis means to scale the time axis to 1s, for example. But how can this be made? Let's say for example I have a measurement which spans 3.34 seconds (1000 timestamps and 1000 measurements). How can I normalize this measurement?
If you want to normalize you data you can do as you suggest and simply calculate:
z_i=\frac{x_i-min(x)}{max(x)-min(x)}
(Sorry but i cannot post images yet but you can visit this )
where zi is your i-th normalized time data, and xi is your absolute data.
An example using numpy:
import numpy
x = numpy.random.rand(10) # generate 10 random values
normalized = (x-min(x))/(max(x)-min(x))
print(x,normalized)
I have two time series of 3D accelerometer data that have different time bases (clocks started at different times, with some very slight creep during the sampling time), as well as containing many gaps of different size (due to delays associated with writing to separate flash devices).
The accelerometers I'm using are the inexpensive GCDC X250-2. I'm running the accelerometers at their highest gain, so the data has a significant noise floor.
The time series each have about 2 million data points (over an hour at 512 samples/sec), and contain about 500 events of interest, where a typical event spans 100-150 samples (200-300 ms each). Many of these events are affected by data outages during flash writes.
So, the data isn't pristine, and isn't even very pretty. But my eyeball inspection shows it clearly contains the information I'm interested in. (I can post plots, if needed.)
The accelerometers are in similar environments but are only moderately coupled, meaning that I can tell by eye which events match from each accelerometer, but I have been unsuccessful so far doing so in software. Due to physical limitations, the devices are also mounted in different orientations, where the axes don't match, but they are as close to orthogonal as I could make them. So, for example, for 3-axis accelerometers A & B, +Ax maps to -By (up-down), +Az maps to -Bx (left-right), and +Ay maps to -Bz (front-back).
My initial goal is to correlate shock events on the vertical axis, though I would eventually like to a) automatically discover the axis mapping, b) correlate activity on the mapped aces, and c) extract behavior differences between the two accelerometers (such as twisting or flexing).
The nature of the times series data makes Python's numpy.correlate() unusable. I've also looked at R's Zoo package, but have made no headway with it. I've looked to different fields of signal analysis for help, but I've made no progress.
Anyone have any clues for what I can do, or approaches I should research?
Update 28 Feb 2011: Added some plots here showing examples of the data.
My interpretation of your question: Given two very long, noisy time series, find a shift of one that matches large 'bumps' in one signal to large bumps in the other signal.
My suggestion: interpolate the data so it's uniformly spaced, rectify and smooth the data (assuming the phase of the fast oscillations is uninteresting), and do a one-point-at-a-time cross correlation (assuming a small shift will line up the data).
import numpy
from scipy.ndimage import gaussian_filter
"""
sig1 and sig 2 are assumed to be large, 1D numpy arrays
sig1 is sampled at times t1, sig2 is sampled at times t2
t_start, t_end, is your desired sampling interval
t_len is your desired number of measurements
"""
t = numpy.linspace(t_start, t_end, t_len)
sig1 = numpy.interp(t, t1, sig1)
sig2 = numpy.interp(t, t2, sig2)
#Now sig1 and sig2 are sampled at the same points.
"""
Rectify and smooth, so 'peaks' will stand out.
This makes big assumptions about your data;
these assumptions seem true-ish based on your plots.
"""
sigma = 10 #Tune this parameter to get the right smoothing
sig1, sig2 = abs(sig1), abs(sig2)
sig1, sig2 = gaussian_filter(sig1, sigma), gaussian_filter(sig2, sigma)
"""
Now sig1 and sig2 should look smoothly varying, with humps at each 'event'.
Hopefully we can search a small range of shifts to find the maximum of the
cross-correlation. This assumes your data are *nearly* lined up already.
"""
max_xc = 0
best_shift = 0
for shift in range(-10, 10): #Tune this search range
xc = (numpy.roll(sig1, shift) * sig2).sum()
if xc > max_xc:
max_xc = xc
best_shift = shift
print 'Best shift:', best_shift
"""
If best_shift is at the edges of your search range,
you should expand the search range.
"""
If the data contains gaps of unknown sizes that are different in each time series, then I would give up on trying to correlate entire sequences, and instead try cross correlating pairs of short windows on each time series, say overlapping windows twice the length of a typical event (300 samples long). Find potential high cross correlation matches across all possibilities, and then impose a sequential ordering constraint on the potential matches to get sequences of matched windows.
From there you have smaller problems that are easier to analyze.
This isn't a technical answer, but it might help you come up with one:
Convert the plot to an image, and stick it into a decent image program like gimp or photoshop
break the plots into discrete images whenever there's a gap
put the first series of plots in a horizontal line
put the second series in a horizontal line right underneath it
visually identify the first correlated event
if the two events are not lined up vertically:
select whichever instance is further to the left and everything to the right of it on that row
drag those things to the right until they line up
This is pretty much how an audio editor works, so you if you converted it into a simple audio format like an uncompressed WAV file, you could manipulate it directly in something like Audacity. (It'll sound horrible, of course, but you'll be able to move the data plots around pretty easily.)
Actually, audacity has a scripting language called nyquist, too, so if you don't need the program to detect the correlations (or you're at least willing to defer that step for the time being) you could probably use some combination of audacity's markers and nyquist to automate the alignment and export the clean data in your format of choice once you tag the correlation points.
My guess is, you'll have to manually build an offset table that aligns the "matches" between the series. Below is an example of a way to get those matches. The idea is to shift the data left-right until it lines up and then adjust the scale until it "matches". Give it a try.
library(rpanel)
#Generate the x1 and x2 data
n1 <- rnorm(500)
n2 <- rnorm(200)
x1 <- c(n1, rep(0,100), n2, rep(0,150))
x2 <- c(rep(0,50), 2*n1, rep(0,150), 3*n2, rep(0,50))
#Build the panel function that will draw/update the graph
lvm.draw <- function(panel) {
plot(x=(1:length(panel$dat3))+panel$off, y=panel$dat3, ylim=panel$dat1, xlab="", ylab="y", main=paste("Alignment Graph Offset = ", panel$off, " Scale = ", panel$sca, sep=""), typ="l")
lines(x=1:length(panel$dat3), y=panel$sca*panel$dat4, col="red")
grid()
panel
}
#Build the panel
xlimdat <- c(1, length(x1))
ylimdat <- c(-5, 5)
panel <- rp.control(title = "Eye-Ball-It", dat1=ylimdat, dat2=xlimdat, dat3=x1, dat4=x2, off=100, sca=1.0, size=c(300, 160))
rp.slider(panel, var=off, from=-500, to=500, action=lvm.draw, title="Offset", pos=c(5, 5, 290, 70), showvalue=TRUE)
rp.slider(panel, var=sca, from=0, to=2, action=lvm.draw, title="Scale", pos=c(5, 70, 290, 90), showvalue=TRUE)
It sounds like you want to minimize the function (Ax'+By) + (Az'+Bx) + (Ay'+Bz) for a pair of values: Namely, the time-offset: t0 and a time scale factor: tr. where Ax' = tr*(Ax + t0), etc..
I would look into SciPy's bivariate optimize functions. And I would use a mask or temporarily zero the data (both Ax' and By for example) over the "gaps" (assuming the gaps can be programmatically determined).
To make the process more efficient, start with a coarse sampling of A and B, but set the precision in fmin (or whatever optimizer you've selected) that is commensurate with your sampling. Then proceed with progressively finer-sampled windows of the full dataset until your windows are narrow and are not down-sampled.
Edit - matching axes
Regarding the issue of trying to identify which axis is co-linear with a given axis, and not knowing at thing about the characteristics of your data, i can point towards a similar question. Look into pHash or any of the other methods outlined in this post to help identify similar waveforms.