I need calculate the solar zenith angle for approximately 106.000.000 of different coordinates. This coordinates are referrals to the pixels from an image projected at Earth Surface after the image had been taken by camera into the airplane.
I am using the pvlib.solarposition.get_position() to calculate the solar zenith angle. The values returned are being calculated correctly (I compared some results with NOOA website) but, how I need calculate the solar zenith angle for many couple of coordinates, the python is spending many days (about 5 days) to finish the execution of the function.
How I am a beginner in programming, I wonder is there is any way to accelerate the solar zenith angle calculation.
Below found the part of the code implemented which calculate the solar zenith angle:
sol_apar_zen = []
for i in range(size3):
solar_position = np.array(pvl.solarposition.get_solarposition(Data_time_index, lat_long[i][0], lat_long[i][1]))
sol_apar_zen.append(solar_position[0][0])
print(len(sol_apar_zen))
Technically, if you need to compute Solar Zenith Angle quickly for a large list (array), there are more efficient algorithms than the PVLIB's one. For example, the one described by Roberto Grena in 2012 (https://doi.org/10.1016/j.solener.2012.01.024).
I found a suitable implementation here: https://github.com/david-salac/Fast-SZA-and-SAA-computation (you mind need some tweaks, but it's simple to use it, plus it's also implemented for other languages than Python like C/C++ & Go).
Example of how to use it:
from sza_saa_grena import solar_zenith_and_azimuth_angle
# ...
# A random time series:
time_array = pd.date_range("2020/1/1", periods=87_600, freq="10T", tz="UTC")
sza, saa = solar_zenith_and_azimuth_angle(longitude=-0.12435, # London longitude
latitude=51.48728, # London latitude
time_utc=time_array)
That unit-test (in the project's folder) shows that in the normal latitude range, an error is minimal.
Since your coordinates represent a grid, another option would be to calculate the zenith angle for a subset of your coordinates, and the do a 2-d interpolation to obtain the remainder. 1 in 100 in both directions would reduce your calculation time by a factor of 10000.
If you want to fasten up this calculation you can use the numba core (if installed)
location.get_solarposition(
datetimes,
method='nrel_numba'
)
Otherwise you have to implement your own calculation based on vectorized numpy arrays. I know it is possible but I am not allowed to share. You can find the formulation if you search for spencer 1971 solar position
Related
I have a subsurface temperature data upto 300m oceanic depth (having irregular depth). And I want to calculate ocean heat content for 0-300m in Python. The cell area is being computed by CDO tool.
The formula is:
OHC = sea water density * Specific heat capacity * integrating the temperature over this depth.
I am able to write a code.
#OHC Calculation
def ocean_heat(Temperature,cell Area):
density = 1026 #kg/m^3
c_p = 3990 #J/(kg K)
heat = Temperature.sum(dim=['depth','lon','lat']) * density * c_p * cell Area
return heat
But, the depth is not on same interval. So I think there is need to use weighted temperature. So if anyone can help to know the proper procedure to compute OHC. And if there is another sources or modules then please let me know.
Thank you.
If your dataset is a NetCDF file I suggest taking a look at the Xarray package. It is used to work with labeled multidimensional arrays. It is very popular in Earth Science.
Here is an example from Pangeo using Xarray to calculate ocean heat content:
https://gallery.pangeo.io/repos/NCAR/notebook-gallery/notebooks/Run-Anywhere/Ocean-Heat-Content/OHC_tutorial.html
The first part is about speeding up the computation with Dask. Task 8 is where they start calculating ocean heat content.
I was using the Google Maps Distance matrix API in python to calculate distances on bicycle between two points, using latitude and longitude. I was using a loop to calculate almost 300,000 rows of data for a student project (I am studying Data Science with Python). I added a debug line to output the row# and distance every 10,000 rows, but after humming away for a while with no results, I stopped the kernel and changed it to every 1000 rows. With that, after about 5 minutes it finally got to row 1000. After over an hour, it was only on row 70,000. Unbelievable. I stopped execution and later that day got an email from Google saying I had used up my free trial. so not only did it work incredibly slowly, I can't even use it at all anymore for a student project without incurring enormous fees.
So I rewrote the code to use geometry and just calculate "as the crow flies" distance. Not really what I want, but short of any alternatives, that's my only option.
Does anyone know of another (open-source, free) way to calculate distance to get what I want, or how to use the google distance matrix API more efficiently?
thanks,
so here is some more information, as suggested I post a bit more. I am trying to calculate distances between "stations", and am given lat's and long's for about 300K pairs. I was going to set up a function and then apply that function to the dataframe (bear with me, I'm still new at python and dataframes) -- but for now I was using a loop to go through all the pairs. Here is my code:
i = 0
while i < len(trip):
from_coords = str(result.loc[i, 'from_lat']) + " " + str(result.loc[i, 'from_long'])
to_coords = str(result.loc[i, 'to_lat']) + " " + str(result.loc[i, 'to_long'])
# now to get distances!!!
distance = gmaps.distance_matrix([from_coords], #origin lat & long, formatted for gmaps
[to_coords], #destination lat & long, formatted for gmaps
mode='bicycling')['rows'][0]['elements'][0] #mode=bicycling to use streets for cycling
result['distance'] = distance['distance']['value']
# added this bit to see how quickly/slowly the code is running
# ... and btw it's running very slowly. had the debug line at 10000 and changed it to 1000
# ... and i am running on a with i9-9900K with 48GB ram
# ... why so slow?
if i % 1000 == 0:
print(distance['distance']['value'])
i += 1
You could approximate the distance in KM with the haversine distance.
Here I have my distances as lat/long pairs as random_distances with shape (300000, 2) as a numpy array:
import numpy as np
from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
random_distances = np.random.random( (300000,2) )
Than we can approximate the distances with
distances = np.zeros( random_distances.shape[0] - 2 )
for idx in range(random_distances.shape[0]-2):
distances[idx] = dist.pairwise(np.radians(random_distances[idx:idx+2]), np.radians(random_distances[idx:idx+2]) )[0][1]
distances *= 6371000/1000 # to get output as KM
distances now contains the distances.
It is 'allright' in speed, but can be improved. We could get rid of the for loop for instance, also 2x2 distances are returned and only 1 is used.
The haversine distance is an good approximation, but not exact which I imagine the API is:
From sklearn:
As the Earth is nearly spherical, the haversine formula provides a good approximation of the distance between two points of the Earth surface, with a less than 1% error on average.
I'm using a Kalman filter to track the position of a vehicle and for my measurement, I have a GPX file (WGS84 Format) containing information about the latitude, longitude, elevation and the timestamp of each point given by GPS. Using this data, I computed the distance between GPS points (Using Geodesic distance and Vincenty formula) and, since the timestamp information is known, the time difference between the points can be used to calculate the time delta. Since, we now have the distance and the time delta between the points, we can calculate the velocity (= distance between points/time delta) which could then be also used as a measurement input to the Kalman.
However, I have read that this is only the average velocity and not the instantaneous velocity at any given point. In order to obtain the instantaneous velocity, it is suggested that one must take the running average and some implementations directly compute the velocity considering the difference in time between the current time at the point with the first initial point. I'm a bit confused as to which method I need to use to implement this in python.
Firstly, is this method used in my implementation correct to calculate velocity? (I also read about doppler shifts that can be used but sadly, I only collect the GPS data through a running app (Strava) on my iPhone)
How can the instantaneous velocity at every GPS point be calculated from my implementation?( Is the bearing information also necessary?)
What would be the error from this computed velocity? (Since the error in position itself from an iPhone can be about 10 metres, error from distance measurement about 1mm and considering that I want the focus to be on as much accuracy as possible)
Current Implementation
import gpxpy
import pandas as pd
import numpy as np
from geopy.distance import vincenty, geodesic
import matplotlib.pyplot as plt
"Import GPS Data"
with open('my_run_001.gpx') as fh:
gpx_file = gpxpy.parse(fh)
segment = gpx_file.tracks[0].segments[0]
coords = pd.DataFrame([
{'lat': p.latitude,
'lon': p.longitude,
'ele': p.elevation,
} for p in segment.points])
"Compute delta between timestamps"
times = pd.Series([p.time for p in segment.points], name='time')
dt = np.diff(times.values) / np.timedelta64(1, 's')
"Find distance between points using Vincenty and Geodesic methods"
vx = []
for i in range(len(coords.lat)-1):
if(i<=2425):
vincenty_distance = vincenty([coords.lat[i], coords.lon[i]],[coords.lat[i+1], coords.lon[i+1]]).meters
vx.append(vincenty_distance)
print(vx)
vy = []
for i in range(len(coords.lat)-1):
if(i<=2425):
geodesic_distance = geodesic([coords.lat[i], coords.lon[i]],[coords.lat[i+1], coords.lon[i+1]]).meters
vy.append(geodesic_distance)
print(vy)
"Compute and plot velocity"
velocity = vx/dt
time = [i for i in range(len(dt))]
plt.plot(velocity,time)
plt.xlabel('time')
plt.ylabel('velocity')
plt.title('Plot of Velocity vs Time')
plt.show()
Reference for GPX Data:
https://github.com/stevenvandorpe/testdata/blob/master/gps_coordinates/gpx/my_run_001.gpx
interesting topic out here.
If you are planning to use standalone GPS location output for calculating velocity of an object, be ready for some uncertainty of the results. As I am sure you know, there are certain propagation delays in the whole process, so there are several information you need to pay attention on.
1.
Basically, taking the distance and time, and then calculating velocity based on those deltas is right approach, but as you've told, that is the average velocity between two gps measurements, since gps has some propagation delay in its nature.
2.
Like we've told, this kind of calculation gives us average velocity in function of delta time and distance, and by nature, we cant change that. What we can do is affect the frequency of sampling the gps signal, and by that increase or decrease real time accuracy of our system.
SUGGESTION: If you wish a bit more accurate real time velocity data, I suggest you involving gyroscope sensor of your phone and processing it's output. Collecting first delta (average) speed from GPS and then detect gyro changes will be interesting way to continue with your idea.
3.
Lets say you are walking (or superspeed running :)) with your device. At one moment, device is sending request for GPS location but due some issues (perhaps bad satellite connection) you got response with data with 10 seconds delay. For purpose of the example lets consider you are walking on absolutely straight line on absolutely flat part of surface :) After 1 minute from last request received you are sending another request for gps location, and you receive the data which told you you've walked 300m on north from your previous measurement, with 2 secs delay. If you measure it from send request to send another request you would get your speed was 300/70 = 4.28 m/s (quite impressive speed), but what's one of actual possible scenarions:
- You didnt walked 300m, you walked 270 m (gps error)
- Time between two measures (received) is around 62s
- You were even faster with 270/62 = 4.84 m/s
With phone it is tricky you cant measure when did you actually sent request in ether or when in ms you got the response, and those things are quite possible when you are manipulating sensors on hardware-proximity layer. Therefore you will certainly loose some accuracy.
kmeans does not work properly for geospatial coordinates - even when changing the distance function to haversine as stated here.
I had a look at DBSCAN which doesn
t let me set a fixed number of clusters.
Is there any algorithm (in python if possible) that has the same input values as kmeans? or
Can I easily convert latitude, longitude to euclidean coordinates (x,y,z) as done here and do the calculation on my data?
It does not have to perfectly accurate, but it would nice if it would.
Using just lat and longitude leads to problems when your geo data spans a large area. Especially since the distance between longitudes is less near the poles. To account for this it is good practice to first convert lon and lat to cartesian coordinates.
If your geo data spans the united states for example you could define an origin from which to calculate distance from as the center of the contiguous united states. I believe this is located at Latitude 39 degrees 50 minutes and Longitude 98 degrees 35 minute.
TO CONVERT lat lon to CARTESIAN coordinates- calculate the distance using haversine, from every location in your dataset to the defined origin. Again, I suggest Latitude 39 degrees 50 minutes and Longitude 98 degrees 35 minute.
You can use haversine in python to calculate these distances:
from haversine import haversine
origin = (39.50, 98.35)
paris = (48.8567, 2.3508)
haversine(origin, paris, miles=True)
Now you can use k-means on this data to cluster, assuming the haversin model of the earth is adequate for your needs. If you are doing data analysis and not planning on launching a satellite I think this should be okay.
Have you tried kmeans? The issue raised in the linked question seems to be with points that are close to 180 degrees. If your points are all close enough together (like in the same city or country for example) then kmeans might work OK for you.
Background
I want to add a model manager function that filters a queryset based on the proximity to coordinates. I found this blog posting with code that is doing precisely what I want.
Code
The snippet below seems to make use of geopy functions that have since been removed. It coarsely narrows down the queryset by limiting the range of latitude and longitude.
# Prune down the set of all locations to something we can quickly check precisely
rough_distance = geopy.distance.arc_degrees(arcminutes=geopy.distance.nm(miles=distance)) * 2
queryset = queryset.filter(
latitude__range=(latitude - rough_distance, latitude + rough_distance),
longitude__range=(longitude - rough_distance, longitude + rough_distance)
)
Problem
Since some of the used geopy functions have been removed/moved, I'm trying to rewrite this stanza. However, I do not understand the calculations---barely passed geometry and my research has confused me more than actually helped me.
Can anyone help? I would greatly appreciate it.
In case anybody else is looking at this now, since I tried to use geopy and just hit up against it, the modern equivalent of the rough_distance snippet above is:
import geopy
rough_distance = geopy.units.degrees(arcminutes=geopy.units.nautical(miles=1))
It looks like distance in miles is being converted to nautical miles, which are each equal to a minute of arc, which are 1/60th of an arc degree each. That value is then doubled, and then added and subtracted from a given latitude and longitude. These four values can be used to form a bounding box around the coordinates.
You can lookup any needed conversion factors on Wikipedia. There's also a relevant article there titled Horizontal position representation which discusses pros and cons of alternatives to longitude and latitude positioning which avoid some of their complexities. In other words, about the considerations involved with replacing latitude and longitude with another horizontal position representation in calculations.
The Earth is not a sphere, only approximately so. If you need a more accurate calculation, use pyproj. Then you can calculate the location based a reference ellipsoid (e.g. WGS84).
martineau's answer is right on, in terms of what the snippet actually does, but it is important to note that 1 minute of arc represents very different distances depending on location. At the equator, the query covers the least axis aligned bounding box enclosing a circle of diameter distance, but off the equator, the bounding box does not completely contain that circle.
This code from the blog is sloppy:
def near(self, latitude=None, longitude=None, distance=None):
if not (latitude and longitude and distance):
return []
If latitude == 0 (equator) or longitude == 0 (Greenwich meridian), it returns immediately. Should be if latitude is None or longitude is None .......
#TokenMacGuy's answer is an improvement, but:
(a) The whole idea of the "bounding box" is to avoid an SQL or similar query calculating a distance to all points that otherwise satisfy the query. With appropriate indexes, the query will execute much faster. It does this at the cost of leaving the client to (1) calculate the coordinates of the bounding box (2) calculate and check the precise distance for each result returned by the query.
If step 2 is omitted, you get errors, even at the equator. For example "find all pizza shops in a 5-mile radius" means you get answers up to 7.07 miles (that's sqrt(5*2 + 5*2)) away in the corners of the box.
Note that the code that you show seems to be arbitrarily doubling the radius. This would mean you get points 14.1 miles away.
(b) As #TokenMacGuy said, away from the equator, it gets worse. The bounding box so calculated does not include all points that you are interested in -- unless of course you are overkilling by doubling the radius.
(c) If the circle of interest includes either the North or South Pole, the calculation is horribly inexact, and needs adjusting. If the circle of interest is crossed by the 180-degree meridian (i.e. the International Date Line without the zigzags), the results are a nonsense; you need to detect this case and apply a 2-part query (one part for each side of the meridian).
For solutions for problems (b) and (c), see this article.
If the coordinates on the earth are known, you can use geopy to get a good estimate of the decimal degrees to miles (or any distance units) scale at that point:
SCALE_VAL = 0.1
lat_scale_point = (cur_lat + SCALE_VAL, cur_long)
long_scale_point = (cur_lat, cur_long + SCALE_VAL)
cur_point = (cur_lat, cur_long)
lat_point_miles = distance.distance(cur_point, lat_scale_point).miles
long_point_miles = distance.distance(cur_point, long_scale_point).miles
# Assumes that 'radius_miles` is the range around the point you want to look for
lat_rough_distance = (radius_miles / lat_point_miles) * SCALE_VAL
long_rough_distance = (radius_miles / long_point_miles) * SCALE_VAL
Some caveats:
Special-case handling for the the scale points is needed around polls or prime meridean
Depending on how large or small you want your radius to be, you could pick a more appropriate SCALE_VAL